How to Filter Spatial Data in Python Using GeoPandas
A common GIS task is reducing a layer to only the features you need. For example, you may have a shapefile of land use polygons and only want residential areas, or a GeoJSON of administrative boundaries where you need districts from one reg
Problem statement
A common GIS task is reducing a layer to only the features you need. For example, you may have a shapefile of land use polygons and only want residential areas, or a GeoJSON of administrative boundaries where you need districts from one region with population above a threshold.
In GeoPandas, this usually means filtering a GeoDataFrame by attribute values. Typical cases include:
- keeping features where one column matches a value
- applying multiple conditions
- filtering text fields by exact or partial matches
- removing rows with null values before mapping or analysis
The goal is to subset spatial data without losing the geometry column, so the result is still ready for plotting, exporting, spatial joins, or clipping.
Quick answer
To filter GeoPandas data, use the same boolean indexing pattern you would use with pandas. A filtered GeoDataFrame keeps its geometry column.
import geopandas as gpd
gdf = gpd.read_file("data/landuse.shp")
residential = gdf[gdf["landuse"] == "residential"].copy()
residential is still a GeoDataFrame, so you can save it, map it, or use it in later GIS steps.
Step-by-step solution
Load spatial data into a GeoDataFrame
Read a shapefile or GeoJSON with GeoPandas:
import geopandas as gpd
# Shapefile
gdf = gpd.read_file("data/city_landuse.shp")
# Or GeoJSON
# gdf = gpd.read_file("data/city_landuse.geojson")
Before filtering, inspect the columns and a few rows:
print(gdf.columns)
print(gdf.head())
print(gdf.dtypes)
This helps you confirm:
- the exact column names
- whether a field is text or numeric
- how values are spelled and capitalized
For example, a land use layer might contain columns like:
landusedistrictpopulationstatus
Filter by a single column value
To keep only features where one attribute matches a value:
residential = gdf[gdf["landuse"] == "residential"]
print(type(residential))
print(len(residential))
If you want a clean standalone result for later editing or export, use .copy():
residential = gdf[gdf["landuse"] == "residential"].copy()
Filter with multiple conditions
Use & for AND, and wrap each condition in parentheses:
filtered = gdf[
(gdf["district"] == "North")
& (gdf["population"] > 5000)
].copy()
This keeps only features in the North district with population above 5000.
Use | for OR:
green_areas = gdf[
(gdf["landuse"] == "park")
| (gdf["landuse"] == "forest")
].copy()
If you need several values from one column, isin() is usually cleaner:
selected = gdf[gdf["landuse"].isin(["park", "forest", "wetland"])].copy()
Be careful with parentheses. This is correct:
subset = gdf[
(gdf["region"] == "East")
& (gdf["status"] == "active")
]
This is not:
# Incorrect
# subset = gdf[gdf["region"] == "East" & gdf["status"] == "active"]
Code examples
Filter text fields
Match an exact text value:
district = gdf[gdf["district_name"] == "Central"].copy()
For partial matches, use .str.contains():
central_matches = gdf[
gdf["district_name"].str.contains("central", na=False)
].copy()
To ignore case differences:
central_matches = gdf[
gdf["district_name"].str.contains("central", case=False, na=False)
].copy()
This helps when values vary like Central, CENTRAL, or central district.
Filter numeric fields
Keep values above or below a threshold:
large_areas = gdf[gdf["area_sqkm"] > 10].copy()
small_areas = gdf[gdf["area_sqkm"] < 1].copy()
Filter within a numeric range:
medium_pop = gdf[
(gdf["population"] >= 1000)
& (gdf["population"] <= 10000)
].copy()
If a numeric column was read as text, convert it first:
import pandas as pd
gdf["population"] = pd.to_numeric(gdf["population"], errors="coerce")
Filter rows with missing or empty values
Find null values:
missing_status = gdf[gdf["status"].isna()].copy()
Exclude incomplete records:
cleaned = gdf[gdf["status"].notna()].copy()
If the data uses empty strings instead of true nulls, check both:
cleaned = gdf[
gdf["status"].notna()
& (gdf["status"].str.strip() != "")
].copy()
Export the filtered result
Create a filtered output:
export_gdf = gdf[
(gdf["landuse"] == "residential")
& (gdf["district"] == "North")
].copy()
Save it as a shapefile or GeoJSON:
export_gdf.to_file("output/north_residential.shp")
export_gdf.to_file("output/north_residential.geojson", driver="GeoJSON")
Explanation
A GeoDataFrame behaves like a pandas DataFrame with an added geometry column. When you filter rows using boolean conditions, GeoPandas keeps the geometry for the matching records.
That means this works like standard pandas filtering:
gdf[gdf["column"] == value]
but the result remains spatial.
Filtering rows does not change the geometry itself. It only decides which features stay in the output.
For example:
- filtering by
landuse == "park"keeps only park features - buffering changes geometry shapes
- clipping changes geometry extent
So in this page, filtering means attribute filtering, not geometry-based predicates like within() or intersects().
In real GIS workflows, attribute filtering is often the first step before:
- spatial joins
- clipping
- plotting
- exporting
- dissolving
Reducing the dataset early makes later steps faster and easier to validate.
Edge cases or notes
- CRS issues: Attribute filtering does not depend on CRS, but if you filter and then run spatial operations, make sure the layer uses the expected CRS. Check with
gdf.crsand reproject withgdf.to_crs(...)if needed. - Invalid geometries: Filtering by attributes does not fix bad geometry. If later steps fail, inspect invalid features with
~gdf.is_valid. - Column names with spaces: Use bracket syntax like
gdf["land use"], not attribute-style access. - String versus numeric data types: If
populationis stored as text, numeric comparisons may fail or return incorrect results. Checkgdf.dtypes. - Empty filter results: If your result has zero rows, verify spelling, case, null values, and data type mismatches.
- Large files and memory: Filtering large layers usually means loading the dataset into memory first with
read_file(). For very large datasets, consider using more scalable storage formats or preprocessing steps before loading data into GeoPandas.
Internal links
- For the broader workflow, see GeoPandas basics for vector data in Python.
- To load source data first, read How to Read a Shapefile in Python with GeoPandas.
- To save the filtered output, see How to Export GeoJSON in Python with GeoPandas.
- If you need to fix projection before later spatial analysis, read How to Reproject Spatial Data in Python (GeoPandas).
- If your selection returns nothing, check Why a GeoPandas Filter Returns Empty Results.
FAQ
Can I filter a shapefile directly with GeoPandas?
Yes. Read the shapefile with gpd.read_file(), filter the resulting GeoDataFrame, then save the filtered result with .to_file().
How do I filter multiple values in one column?
Use isin():
subset = gdf[gdf["landuse"].isin(["park", "forest", "residential"])].copy()
This is usually cleaner than chaining multiple OR conditions.
Why does my GeoPandas filter return no rows?
Common causes are:
- wrong column name
- case mismatch in text values
- extra spaces in strings
- numeric values stored as text
- null values in the field being filtered
Check gdf.columns, gdf.dtypes, and gdf.head() first.
Does filtering remove the geometry column?
No. Filtering rows keeps the geometry column intact, so the output is still a GeoDataFrame.
Related articles
Keep exploring with more guides in this category.
How to Fix Invalid Geometries in Python (GeoPandas)
How to find and fix invalid geometries in GeoPandas using buffer(0), make_valid, and geometry validation checks.
Read article →
How to Read a Shapefile in Python with GeoPandas
Step-by-step guide to reading a shapefile in Python using GeoPandas, with examples and common issues covered.
Read article →
How to Reproject Spatial Data in Python (GeoPandas)
How to reproject spatial data in Python using GeoPandas to_crs(), with examples for common coordinate systems.
Read article →