How to Filter Spatial Data in Python Using GeoPandas

Problem statement

A common GIS task is reducing a layer to only the features you need. For example, you may have a shapefile of land use polygons and only want residential areas, or a GeoJSON of administrative boundaries where you need districts from one region with population above a threshold.

In GeoPandas, this usually means filtering a GeoDataFrame by attribute values. Typical cases include:

keeping features where one column matches a value
applying multiple conditions
filtering text fields by exact or partial matches
removing rows with null values before mapping or analysis

The goal is to subset spatial data without losing the geometry column, so the result is still ready for plotting, exporting, spatial joins, or clipping.

Quick answer

To filter GeoPandas data, use the same boolean indexing pattern you would use with pandas. A filtered GeoDataFrame keeps its geometry column.

import geopandas as gpd

gdf = gpd.read_file("data/landuse.shp")

residential = gdf[gdf["landuse"] == "residential"].copy()

residential is still a GeoDataFrame, so you can save it, map it, or use it in later GIS steps.

Step-by-step solution

Load spatial data into a GeoDataFrame

Read a shapefile or GeoJSON with GeoPandas:

import geopandas as gpd

# Shapefile
gdf = gpd.read_file("data/city_landuse.shp")

# Or GeoJSON
# gdf = gpd.read_file("data/city_landuse.geojson")

Before filtering, inspect the columns and a few rows:

print(gdf.columns)
print(gdf.head())
print(gdf.dtypes)

This helps you confirm:

the exact column names
whether a field is text or numeric
how values are spelled and capitalized

For example, a land use layer might contain columns like:

landuse
district
population
status

Filter by a single column value

To keep only features where one attribute matches a value:

residential = gdf[gdf["landuse"] == "residential"]
print(type(residential))
print(len(residential))

If you want a clean standalone result for later editing or export, use .copy():

residential = gdf[gdf["landuse"] == "residential"].copy()

Filter with multiple conditions

Use & for AND, and wrap each condition in parentheses:

filtered = gdf[
    (gdf["district"] == "North")
    & (gdf["population"] > 5000)
].copy()

This keeps only features in the North district with population above 5000.

Use | for OR:

green_areas = gdf[
    (gdf["landuse"] == "park")
    | (gdf["landuse"] == "forest")
].copy()

If you need several values from one column, isin() is usually cleaner:

selected = gdf[gdf["landuse"].isin(["park", "forest", "wetland"])].copy()

Be careful with parentheses. This is correct:

subset = gdf[
    (gdf["region"] == "East")
    & (gdf["status"] == "active")
]

This is not:

# Incorrect
# subset = gdf[gdf["region"] == "East" & gdf["status"] == "active"]

Code examples

Filter text fields

Match an exact text value:

district = gdf[gdf["district_name"] == "Central"].copy()

For partial matches, use .str.contains():

central_matches = gdf[
    gdf["district_name"].str.contains("central", na=False)
].copy()

To ignore case differences:

central_matches = gdf[
    gdf["district_name"].str.contains("central", case=False, na=False)
].copy()

This helps when values vary like Central, CENTRAL, or central district.

Filter numeric fields

Keep values above or below a threshold:

large_areas = gdf[gdf["area_sqkm"] > 10].copy()
small_areas = gdf[gdf["area_sqkm"] < 1].copy()

Filter within a numeric range:

medium_pop = gdf[
    (gdf["population"] >= 1000)
    & (gdf["population"] <= 10000)
].copy()

If a numeric column was read as text, convert it first:

import pandas as pd

gdf["population"] = pd.to_numeric(gdf["population"], errors="coerce")

Filter rows with missing or empty values

Find null values:

missing_status = gdf[gdf["status"].isna()].copy()

Exclude incomplete records:

cleaned = gdf[gdf["status"].notna()].copy()

If the data uses empty strings instead of true nulls, check both:

cleaned = gdf[
    gdf["status"].notna()
    & (gdf["status"].str.strip() != "")
].copy()

Export the filtered result

Create a filtered output:

export_gdf = gdf[
    (gdf["landuse"] == "residential")
    & (gdf["district"] == "North")
].copy()

Save it as a shapefile or GeoJSON:

export_gdf.to_file("output/north_residential.shp")

export_gdf.to_file("output/north_residential.geojson", driver="GeoJSON")

Explanation

A GeoDataFrame behaves like a pandas DataFrame with an added geometry column. When you filter rows using boolean conditions, GeoPandas keeps the geometry for the matching records.

That means this works like standard pandas filtering:

gdf[gdf["column"] == value]

but the result remains spatial.

Filtering rows does not change the geometry itself. It only decides which features stay in the output.

For example:

filtering by landuse == "park" keeps only park features
buffering changes geometry shapes
clipping changes geometry extent

So in this page, filtering means attribute filtering, not geometry-based predicates like within() or intersects().

In real GIS workflows, attribute filtering is often the first step before:

spatial joins
clipping
plotting
exporting
dissolving

Reducing the dataset early makes later steps faster and easier to validate.

Edge cases or notes

CRS issues: Attribute filtering does not depend on CRS, but if you filter and then run spatial operations, make sure the layer uses the expected CRS. Check with gdf.crs and reproject with gdf.to_crs(...) if needed.
Invalid geometries: Filtering by attributes does not fix bad geometry. If later steps fail, inspect invalid features with ~gdf.is_valid.
Column names with spaces: Use bracket syntax like gdf["land use"], not attribute-style access.
String versus numeric data types: If population is stored as text, numeric comparisons may fail or return incorrect results. Check gdf.dtypes.
Empty filter results: If your result has zero rows, verify spelling, case, null values, and data type mismatches.
Large files and memory: Filtering large layers usually means loading the dataset into memory first with read_file(). For very large datasets, consider using more scalable storage formats or preprocessing steps before loading data into GeoPandas.

Internal links

For the broader workflow, see GeoPandas basics for vector data in Python.
To load source data first, read How to Read a Shapefile in Python with GeoPandas.
To save the filtered output, see How to Export GeoJSON in Python with GeoPandas.
If you need to fix projection before later spatial analysis, read How to Reproject Spatial Data in Python (GeoPandas).
If your selection returns nothing, check Why a GeoPandas Filter Returns Empty Results.

FAQ

Can I filter a shapefile directly with GeoPandas?

Yes. Read the shapefile with gpd.read_file(), filter the resulting GeoDataFrame, then save the filtered result with .to_file().

How do I filter multiple values in one column?

Use isin():

subset = gdf[gdf["landuse"].isin(["park", "forest", "residential"])].copy()

This is usually cleaner than chaining multiple OR conditions.

Why does my GeoPandas filter return no rows?

Common causes are:

wrong column name
case mismatch in text values
extra spaces in strings
numeric values stored as text
null values in the field being filtered

Check gdf.columns, gdf.dtypes, and gdf.head() first.

Does filtering remove the geometry column?

No. Filtering rows keeps the geometry column intact, so the output is still a GeoDataFrame.