Geospatial analysis in python#

Mapping Murdock’s Ethnographic Atlas#

This notebook illustrates how to do mapping and some geospatial analysis using various python libraries. All of this can of course be done inside of dedicated GIS client such as QGIS or ARCGIS but for easier replication, documentation and/or automation there are advantages to putting code in one place.

The notebook illustrates how to load and display vector polygon geospatial data using the powerful geopandas. We’ll also use ipyleaflet to create ‘slippy’ and interactive maps.

Broad outline:

  1. load vector polygon from Morduck’s Ethnographic Atlas data digitized by Nathan Nunn. We’ll display the data and show how to also plot markers on a different map layer.

  2. Load and display gridded raster data on wheat suitability.

  3. Calculate raster zonal statistics

  4. (not finished) Discuss how data such as that constructed in step 3 was used in an empirical paper by James Fenske…

Hide code cell source
import pandas as pd
import geopandas as gpd
import ipyleaflet as ilf
import matplotlib.pyplot as plt
from zipfile import ZipFile

Morduck’s Ethnographic Atlas#

Shapefile digitized by Nathan Nunn. A ‘shapefile’ is a widely used ARCGIS format to store vector polygon data. It’s not just one file but several files that are often shipped together in a compressed zip file. Geopandas has a method to open these directly from the zip file. We import this shapefile into a geopandas dataframe and inspect the first 3 rows:

Hide code cell source
shpfile = r'zip://borders_tribes.zip'
tribe_areas = gpd.GeoDataFrame.from_file(shpfile)
tribe_areas.head(3)
NAME TRIBE_CODE LAT LON geometry
0 ABABDA 1 23.161800 33.70160 POLYGON ((35.73971 22.87611, 35.54412 22.64326...
1 ABARAMBO 2 3.702955 26.79730 POLYGON ((26.45080 3.39284, 26.45227 3.39798, ...
2 ABE 3 6.069531 -4.26032 POLYGON ((-4.07477 6.37522, -4.01895 6.26358, ...

Among many other things geopandas has a method to allows us to plot this via matplotlib. To make it more attractive we’ll color shade each country polygon by latitude:

tribe_areas.plot(column='LAT', figsize=(12,16));
../_images/7bcba7fa7738d79cdcdd71543a1a0e7c2e683214151c763c84431eb99527662c.png

But this is a static map. For exploratory purposes it’s nice to pass the geometry information from the geodataframe to ipyleaflet to map things on top of a slippy basemap. We can specify what happens on cursor hover, coloring, etc.

The map is an ipython ‘widget’ which means that as we add layers and other features the map will update to reflect those changes, even if the code comes later.

First, we just create a topographic basemap, centered around a location at the Center of Congo.

m = ilf.Map(center = [-4.0383, 21.7587], 
            zoom = 3.5)
m

We will load the dataframe into an ipyleaflet GeoData object that we will then add as a layer to the map.

tribes_data = ilf.GeoData(geo_dataframe = tribe_areas,style={ 'weight':1},
                          hover_style={'fillColor': 'green' , 'fillOpacity': 0.2, 'weight':5},
                          name = 'Tribes')
m.add_layer(tribes_data)

Next add an option to view things full-screen

m.add_control(ilf.FullScreenControl())
control = ilf.LayersControl(position='topright')
m.add_control(control)

And now a marker for each of the ethnographic regions identified on the map such that when we hover over the icon we see the name of the region. Since there are hundreds of regions we’ll also use the ‘MarkerCluster’ object so that the map is not overwhelmed with marker icons. When we zoom out we’ll see just a cluster with a number indicating how many markers lie within the cluster. The individual markers then become visible as we zoom in.

markers = [ilf.Marker(location=(row.LAT, row.LON), title=row.NAME) for i, row in tribe_areas.iterrows()]
marker_cluster = ilf.MarkerCluster(markers = markers)
m.add_layer(marker_cluster)

We can save it to a standalone HTML page that will remain interactive.

m.save('tribes.html', title='Afica Ethnologue Atlas')

NOTE: If the maps above are not visible in the static view of this notebook (as you might see on github). See this saved HTML rendering (which should be interactive)

Raster data, zonal statistics#

We want to calculate zonal statistics (in this case statistics for data that falls inside the ethnic group boundaries). Here we’ll get data on wheat yield potential which comes as raster data from FAO-GAEZ. Each gridpoint on the raster is associated with a suitability index. We’ll plot and then calculate mean suitability in each of the polygons associated with Murdock/Nunn data.

import rasterio
import rasterstats as rs
import matplotlib.pyplot as plt

Get first downloaded raster data on potential crop yields from FAO-GAEZ data. Looks like this.

!dir *.tif
 Volume in drive H is Google Drive
 Volume Serial Number is 1983-1116

 Directory of H:\My Drive\code\GitHub\DevII\notebooks

11/23/2021  03:07 PM         1,873,865 sxHr_rcd.tif
               1 File(s)      1,873,865 bytes
               0 Dir(s)  34,289,737,728 bytes free
#wheat_tif = "res02_crav6190h_whea000a_yld.tif"
wheat_tif = "sxHr_rcd.tif"
wheat = rasterio.open(wheat_tif)
plt.figure(figsize=(12,10))
plt.xlim(1800,2900)
plt.ylim(1600,600)
plt.imshow(wheat.read(1));
../_images/d60499a2f29d585dd31888a296707c2a1c8fd545cafd6ae4ec6b2c6b2165a726.png

Next we use the rasterstats.zonal_stats command to calculate ‘count’, ‘max’, ‘min’, ‘mean’ statistics for the raster data inside each polygon. The function returns the results as a python dictionary.

stats = rs.zonal_stats('zip://borders_tribes.zip/borders_tribes.shp', wheat_tif, prefix = 'wheat_')

Finally let us add these calculated statistics as columns to our geodataframe.

colnames = pd.DataFrame(stats).columns.tolist()
colnames
['wheat_count', 'wheat_max', 'wheat_mean', 'wheat_min']
tribe_areas[colnames] = pd.DataFrame(stats)
tribe_areas.head()
NAME TRIBE_CODE LAT LON geometry wheat_count wheat_max wheat_mean wheat_min
0 ABABDA 1 23.161800 33.70160 POLYGON ((35.73971 22.87611, 35.54412 22.64326... 1988 0.0 0.000000 0.0
1 ABARAMBO 2 3.702955 26.79730 POLYGON ((26.45080 3.39284, 26.45227 3.39798, ... 82 9999.0 8315.512195 6483.0
2 ABE 3 6.069531 -4.26032 POLYGON ((-4.07477 6.37522, -4.01895 6.26358, ... 48 8477.0 7587.083333 6956.0
3 ACHOLI 4 3.108121 32.65680 POLYGON ((33.25900 3.88539, 33.45547 3.86574, ... 319 10000.0 7352.517241 1685.0
4 ADAMAWA 5 7.556091 13.18470 POLYGON ((14.14605 9.96943, 14.14605 9.87880, ... 620 10000.0 6061.716129 0.0

Let’s see a simple chloropeth plot based on mean wheat potential in each ethnic group zone.

tribe_areas.plot(column='wheat_mean', figsize=(12,16));
../_images/e78fa6649d0b557d6377ee1f6984bd2e8048b16b39dcd986fb35fa50a6fbb6a2.png