Tutorial

How to Filter Spatial Data in Python Using GeoPandas

A common GIS task is reducing a layer to only the features you need. For example, you may have a shapefile of land use polygons and only want residential areas, or a GeoJSON of administrative boundaries where you need districts from one reg

Problem statement

A common GIS task is reducing a layer to only the features you need. For example, you may have a shapefile of land use polygons and only want residential areas, or a GeoJSON of administrative boundaries where you need districts from one region with population above a threshold.

In GeoPandas, this usually means filtering a GeoDataFrame by attribute values. Typical cases include:

  • keeping features where one column matches a value
  • applying multiple conditions
  • filtering text fields by exact or partial matches
  • removing rows with null values before mapping or analysis

The goal is to subset spatial data without losing the geometry column, so the result is still ready for plotting, exporting, spatial joins, or clipping.

Quick answer

To filter GeoPandas data, use the same boolean indexing pattern you would use with pandas. A filtered GeoDataFrame keeps its geometry column.

import geopandas as gpd

gdf = gpd.read_file("data/landuse.shp")

residential = gdf[gdf["landuse"] == "residential"].copy()

residential is still a GeoDataFrame, so you can save it, map it, or use it in later GIS steps.

Step-by-step solution

Load spatial data into a GeoDataFrame

Read a shapefile or GeoJSON with GeoPandas:

import geopandas as gpd

# Shapefile
gdf = gpd.read_file("data/city_landuse.shp")

# Or GeoJSON
# gdf = gpd.read_file("data/city_landuse.geojson")

Before filtering, inspect the columns and a few rows:

print(gdf.columns)
print(gdf.head())
print(gdf.dtypes)

This helps you confirm:

  • the exact column names
  • whether a field is text or numeric
  • how values are spelled and capitalized

For example, a land use layer might contain columns like:

  • landuse
  • district
  • population
  • status

Filter by a single column value

To keep only features where one attribute matches a value:

residential = gdf[gdf["landuse"] == "residential"]
print(type(residential))
print(len(residential))

If you want a clean standalone result for later editing or export, use .copy():

residential = gdf[gdf["landuse"] == "residential"].copy()

Filter with multiple conditions

Use & for AND, and wrap each condition in parentheses:

filtered = gdf[
    (gdf["district"] == "North")
    & (gdf["population"] > 5000)
].copy()

This keeps only features in the North district with population above 5000.

Use | for OR:

green_areas = gdf[
    (gdf["landuse"] == "park")
    | (gdf["landuse"] == "forest")
].copy()

If you need several values from one column, isin() is usually cleaner:

selected = gdf[gdf["landuse"].isin(["park", "forest", "wetland"])].copy()

Be careful with parentheses. This is correct:

subset = gdf[
    (gdf["region"] == "East")
    & (gdf["status"] == "active")
]

This is not:

# Incorrect
# subset = gdf[gdf["region"] == "East" & gdf["status"] == "active"]

Code examples

Filter text fields

Match an exact text value:

district = gdf[gdf["district_name"] == "Central"].copy()

For partial matches, use .str.contains():

central_matches = gdf[
    gdf["district_name"].str.contains("central", na=False)
].copy()

To ignore case differences:

central_matches = gdf[
    gdf["district_name"].str.contains("central", case=False, na=False)
].copy()

This helps when values vary like Central, CENTRAL, or central district.

Filter numeric fields

Keep values above or below a threshold:

large_areas = gdf[gdf["area_sqkm"] > 10].copy()
small_areas = gdf[gdf["area_sqkm"] < 1].copy()

Filter within a numeric range:

medium_pop = gdf[
    (gdf["population"] >= 1000)
    & (gdf["population"] <= 10000)
].copy()

If a numeric column was read as text, convert it first:

import pandas as pd

gdf["population"] = pd.to_numeric(gdf["population"], errors="coerce")

Filter rows with missing or empty values

Find null values:

missing_status = gdf[gdf["status"].isna()].copy()

Exclude incomplete records:

cleaned = gdf[gdf["status"].notna()].copy()

If the data uses empty strings instead of true nulls, check both:

cleaned = gdf[
    gdf["status"].notna()
    & (gdf["status"].str.strip() != "")
].copy()

Export the filtered result

Create a filtered output:

export_gdf = gdf[
    (gdf["landuse"] == "residential")
    & (gdf["district"] == "North")
].copy()

Save it as a shapefile or GeoJSON:

export_gdf.to_file("output/north_residential.shp")

export_gdf.to_file("output/north_residential.geojson", driver="GeoJSON")

Explanation

A GeoDataFrame behaves like a pandas DataFrame with an added geometry column. When you filter rows using boolean conditions, GeoPandas keeps the geometry for the matching records.

That means this works like standard pandas filtering:

gdf[gdf["column"] == value]

but the result remains spatial.

Filtering rows does not change the geometry itself. It only decides which features stay in the output.

For example:

  • filtering by landuse == "park" keeps only park features
  • buffering changes geometry shapes
  • clipping changes geometry extent

So in this page, filtering means attribute filtering, not geometry-based predicates like within() or intersects().

In real GIS workflows, attribute filtering is often the first step before:

  • spatial joins
  • clipping
  • plotting
  • exporting
  • dissolving

Reducing the dataset early makes later steps faster and easier to validate.

Edge cases or notes

  • CRS issues: Attribute filtering does not depend on CRS, but if you filter and then run spatial operations, make sure the layer uses the expected CRS. Check with gdf.crs and reproject with gdf.to_crs(...) if needed.
  • Invalid geometries: Filtering by attributes does not fix bad geometry. If later steps fail, inspect invalid features with ~gdf.is_valid.
  • Column names with spaces: Use bracket syntax like gdf["land use"], not attribute-style access.
  • String versus numeric data types: If population is stored as text, numeric comparisons may fail or return incorrect results. Check gdf.dtypes.
  • Empty filter results: If your result has zero rows, verify spelling, case, null values, and data type mismatches.
  • Large files and memory: Filtering large layers usually means loading the dataset into memory first with read_file(). For very large datasets, consider using more scalable storage formats or preprocessing steps before loading data into GeoPandas.

Internal links

FAQ

Can I filter a shapefile directly with GeoPandas?

Yes. Read the shapefile with gpd.read_file(), filter the resulting GeoDataFrame, then save the filtered result with .to_file().

How do I filter multiple values in one column?

Use isin():

subset = gdf[gdf["landuse"].isin(["park", "forest", "residential"])].copy()

This is usually cleaner than chaining multiple OR conditions.

Why does my GeoPandas filter return no rows?

Common causes are:

  • wrong column name
  • case mismatch in text values
  • extra spaces in strings
  • numeric values stored as text
  • null values in the field being filtered

Check gdf.columns, gdf.dtypes, and gdf.head() first.

Does filtering remove the geometry column?

No. Filtering rows keeps the geometry column intact, so the output is still a GeoDataFrame.

Keep exploring with more guides in this category.