Essential geospatial Python libraries

This is a quick overview of essential Python libraries for working with geospatial data. What I think might be valuable for newcomers in this field is some insight on how these libraries interact and are connected.

Shapely and Geopandas

Shapely itself does not provide options to read/write vector file formats (e.g. shapefiles or geojson) or handle projection conversions. This can be handled e.g. with the Fiona library. But there is an even more convenient way:
Geopandas combines the geometry objects of shapely, the read/write/ projection functions of fiona and the powerful dataframe interface of the pandas library in one awesome package. In the spreadsheet-like dataframe, the last column ‘geometry’ stores the shapely geometry objects, all shapely functions can be applied. The pandas mechanics offers super easy ways to manipulate, plot and analyze the data, e.g. dataframe groupby operations etc.



rasterstats: For zonal statistics. Extracts statistics from rasters files or numpy arrays based on geometries.

scikit-image: Library for image manipulation, e.g. histogram adjustments, filter, segmentation/edge detection operations, texture feature extraction etc.

scikit-learn: The best and at the same time easy-to-use Python machine learning library. Regression, classification, dimensionality reductions etc.

folium: Lets you visualize spatial data on interactive leaflet maps.

descartes: Enables plotting of shapely geometries as matplotlib paths/ patches. Also a dependency for the geometry plotting functions of geopandas.

pyproj: For transformation of projections. Mostly unnecessary when using the more conveniant geopandas coordinate reference system (crs) functions.

PySAL: The Python Spatial Analysis Library contains a multitude of functions for spatial analysis, statistical modeling and plotting.

xarray: Great for handling extensive image time series stacks, imagine 5 vegetation indices x 24 dates x 256 pixel x 256 pixel. xarray lets you label the dimensions of the multidimensional numpy array and combines this with many functions and the syntax of the pandas library (e.g. groupby, rolling window, plotting). Not essential for beginners, but it is a great addition when working with extensive time series data.

Here you can find step for step instructions on how to install and setup an Anaconda Python 3 environment for Windows with all of the geospatial libraries described above.

There have been quite a few recommendations for other geospatial libraries and ressources in the comments, take a look! I also recommend checking out the “Awesome geospatial” list.

You can follow me on Twitter @ chrieke

geo. space. tech. geopolitics.