GeoPandas and Data Management around Python

Table of Contents

  1. Objective
  2. Software and Documentation
  3. Getting started with python
    1. Installing Anaconda
    2. Setting up environment
    3. Installing other modules
  4. The Research Question and Data
  5. Maps and Plots on GeoPandas
    1. Setting up
    2. Editing Columns
    3. Precinct level voter distribution
    4. District level voter distribution
    5. Exporting to Shapefile
  6. Interpreting Results

Objective

This objective of this project is more learning oriented. I will get acquainted with the vast python ecosystem, and learn how to manage the numerous packages and modules into 'environments'. Then, I will use an opensource geospatial package called GeoPandas to create maps and plots and attempt to visualize partisan gerrymandering in Wisconsin. I will export these visualizations so that the analysis can be continued on other plaforms.

Software and Documentation

Software and modules used

Documentation Referenced

Getting started with python

The python software ecosystem is vast and complicated and it can be overwhelming at first. Here is a quick guide to all you should know.

Installing Anaconda

Due to the complexity of the python environment, there exists dedicated software to manage workspace environments. In this project, I will be using the most common one, the free and opensource Anaconda. Anaconda is a great platform for managing ‘packages’. Most python tools, GeoPandas included, have ‘dependencies’ i.e. other modules on which it is dependent to run at all. Finding and installing each of these modules is a pain, but Anaconda automates this process. In addition, Anaconda facilitates the creation and maintenance of ‘environments’. It is recommended to create separate environments for each project, because projects could require different versions of the same module (or even python itself). Once Anaconda is installed, it will be accessible by its navigator software or through command-line.

Setting up environment

To create an environment for GeoPandas, I used the following command that I borrowed from GeoPandas installation guide.

conda create -n geo_env
conda activate geo_env
conda config --env --add channels conda-forge
conda config --env --set channel_priority strict
conda install python=3 geopandas

Installing other modules

Let us install the other packages. To do this through command-line, first type in the following to load the environment. Any command from this point will be executed within the environment.

conda activate geo_env

We will now install Jupyter and Matplotlib. Jupyter is a platform that integrates a file browser, live python code, and visual outputs into a format called a notebook. It is much more user-friendly than a command-line terminal, and it facilitates file organization as well. Matplotlib will be used later to output the analysis onto a map.

conda install matplotlib
conda install jupyter

The Research Question and Data

Redistricting has been one of the most important political issues in the last few years. It is a pressing one too, now that the 2020 census is in sight. In June, there was also the [Supreme Court case] (https://www.nytimes.com/2019/06/27/us/politics/supreme-court-gerrymandering.html) where the conservative majority deemed the federal government powerless in regulating partisan gerrymandering. Some say that it is a menace to the democratic process, others point out that it is old as democracy itself, and after over two centuries of debate, we cannot decide on what counts as gerrymandering.

Identifying gerrymandering is difficult, since to claim that a district is gerrymandered partially implies that there exists a ‘correct’ way to draw the border, which simply is not true. However, many people agree there are incorrect ways to draw them, and Wisconsin is a prime example. Wisconsin’s districts are drawn by the state assembly and in 2011, the Republican-controlled assembly redrew its districts with the explicit purpose of electing a strong Republican majority from a minority of voters, and it worked. Republicans have had an uncontested stronghold in the state assembly ever since. In this project, I will attempt to visualize the effects of Wisconsin’s partisan gerrymandering through maps and plots created on GeoPandas.

I will use precinct level voter data assembled by the Metric Geometry and Gerrymandering Group (MGGG), a Boston-based research team that pursues cutting edge research on gerrymandering. They also provide opensource tools and data to give the public access to the research as well.

Maps and Plots on GeoPanda

Setting up

The first step is open jupyter through Anaconda as shown below. This opens a file browser. Locate the folder in which the shapefiles are saved and create a new python file.

import geopandas as gpd
import matplotlib as mpl
import pandas
wisc = gpd.read_file("WI_ltsb_corrected_final.shp")

This code imports the packages I will be using. Matplotlib is used to output the maps and plots later.

Editing Columns

Editing attribute tables in GeoPanda is very easy. There is no need to even make a new column before populating it. I looked at the metadata on MGGG to identify the columns I need for this analysis: the Democratic and Republican voter counts for the Wisconsin state senate and assembly. When there exist multiple candidates from a given party, I will sum them to get the total votes for that party.

The following operation will make three new columns. The sum of votes to the state senate and assembly are calculated for Republicans and Democrats. Then, they are added to find the 'total' voter pool (excluding third parties but their contribution is negligible). Finally, the useful columns are isolated, including ASM which denotes the assembly district number. This will be used later to perform a dissolve. wisc.head() prints out the first five rows to make sure the operation was done correctly.

wisc['strepvt'] = wisc.WSSREP12 + wisc.WSSREP212 + wisc.WSAREP12 + wisc.WSAREP212
wisc['stdemvt'] = wisc.WSSDEM12 + wisc.WSADEM12 + wisc.WSADEM212
wisc['sttotvt'] = wisc.strepvt + wisc.stdemvt
wisc = wisc[['geometry','ASM','sttotvt','strepvt','stdemvt']]
wisc.head()

Precinct level voter distribution

I will calculate the ratio of Democratic votes to total votes which should give a number between 0 and 1 (0 being strongly Republican and 1 being strongly Democrat. However, before doing this, I must remove all the precincts with zero total votes to avoid errors.

pre = wisc.loc[wisc['sttotvt'] != 0]
pre['stdemnorm'] = pre.stdemvt/pre.sttotvt
pre.head()

Now, I will create a histogram of this ratio and save it as a png image file. bins signifies the number of bars in the chart.

pre.hist(column='stdemnorm', bins=20)
mpl.pyplot.title('Percentage of Democrat Votes to WI Assembly 2012, Precinct Level')
mpl.pyplot.savefig("precinct-hist.png", dpi=300)

Before creating a map, I will check its coordinate reference system with this command:

pre.crs

It returned a number 26916, which corresponds to NAD83, zone 16N which is appropriate for this region. I will proceed with mapping.

pre.plot(column='stdemnorm', cmap='RdBu', legend=True);
mpl.pyplot.title('Votes to WI Assembly 2012, Precinct Level')
mpl.pyplot.savefig("precinct-choro.png", dpi=300)

District level voter distribution

Let us perform a dissolve. This is incredibly simple. Only two criteria are needed: the column on which this dissolve is based and the aggregation function, which in our case is a simple sum. I calculated the ratio of Democratic votes much in the same way, but there was no need to remove zeros this time, as all districts have non-zero entries.

dist = wisc.dissolve(by='ASM', aggfunc='sum')
dist['stdemnorm'] = dist.stdemvt/dist.sttotvt
dist.head()

A plot and a map was created in the same way as I did in the precinct-level analysis.

Exporting Shapefile

Exporting to shapefile is done with one function.

pre.to_file("precinct.shp")
dist.to_file("district.shp")

Interpreting Results

|District|Precinct| |------|-----| |||

Looking at the district and precinct level choropleths side by side, it is immediately clear that the borders were drawn to contain the heavily democratic precincts. This will increase 'wasted votes' since any votes after a majority will not affect election results. However, this does not necessarily reflect partisan gerrymandering. People of similar ideology often live in the same geography that is bounded by both physical and cultural boundaries. There is an argument to be made to draw district borders that reflect these different pockets of population. However, looking at the histogram, the effects of partisan gerrymandering is clearly seen.

|District|Precinct| |------|-----| |||

At the precinct-level, we see a more or less even distribution of percentages with a peak in the 45 - 50% democrat range(and a strong peak around the 100% mark). However, the district-level distribution looks very different. The width of the distribution has shrunk considerably, and the peak has shifted to the 40 - 45% range. The shrinking of the width suggests that precincts with slight Democratic majority were grouped with those with Republican majorities to suppress representation.