How to Perform a Spatial Join in Python (GeoPandas)

How to perform a spatial join in Python with GeoPandas sjoin(), with examples for point-in-polygon and other predicates.

Problem statement

A spatial join lets you transfer attributes from one spatial layer to another based on location instead of a shared ID field.

Typical GIS tasks include:

  • assigning district names to customer points
  • joining addresses to census tracts
  • attaching flood zone attributes to parcel polygons
  • matching building footprints to administrative boundaries

In GeoPandas, this means taking two vector layers and joining them using a spatial relationship such as intersects, within, or contains.

This page focuses only on vector spatial joins with GeoPandas.

Quick answer

Use geopandas.sjoin() with two GeoDataFrames that use the same CRS, then choose the correct spatial predicate for your workflow.

import geopandas as gpd

points = gpd.read_file("data/customer_points.geojson")
polygons = gpd.read_file("data/city_districts.gpkg")

if points.crs != polygons.crs:
    polygons = polygons.to_crs(points.crs)

joined = gpd.sjoin(points, polygons, how="left", predicate="within")
print(joined.head())

This keeps all points and adds polygon attributes where each point falls within a polygon.

Step-by-step solution

Load the two spatial layers

Read both datasets into GeoDataFrames.

import geopandas as gpd

points = gpd.read_file("data/customer_points.geojson")
districts = gpd.read_file("data/city_districts.shp")

print(type(points))
print(type(districts))

Both objects should be geopandas.geodataframe.GeoDataFrame.

GeoPandas can read shapefiles, GeoJSON, and GeoPackage files with read_file().

Check geometry types

Geometry type affects which spatial predicate makes sense.

print(points.geom_type.value_counts())
print(districts.geom_type.value_counts())

Typical join patterns include:

  • points to polygons
  • polygons to polygons
  • lines to polygons

If you want to assign polygon attributes to points, within is usually the clearest choice.

Make sure both layers use the same CRS

Spatial joins require both layers to use the same coordinate reference system.

print(points.crs)
print(districts.crs)

If they do not match, reproject one layer before joining:

districts = districts.to_crs(points.crs)

If you skip this step, the join may return no matches or incorrect matches.

Choose the right spatial predicate

Common predicates include:

  • intersects: features touch or overlap in any way
  • within: left geometry is completely inside right geometry
  • contains: left geometry completely contains right geometry

Common choices:

  • points inside polygons: usually within
  • overlapping polygons: usually intersects
  • lines crossing polygons: usually intersects

If you want to inspect which predicates are available in your environment, check the spatial index first:

if points.has_sindex:
    print(points.sindex.valid_query_predicates)

This depends on your installed spatial index backend, so treat it as a useful check rather than a required step.

Run the spatial join

Use gpd.sjoin() to join the left layer to the right layer based on geometry.

joined = gpd.sjoin(
    points,
    districts,
    how="left",
    predicate="within"
)

Join types:

  • left: keep all rows from the left GeoDataFrame
  • inner: keep only matched rows
  • right: available in some workflows, but verify behavior in your GeoPandas version before relying on it

In most point-in-polygon workflows, the left layer is the layer you want to preserve.

Review the output columns

The result includes:

  • original columns from the left layer
  • columns from the right layer, unless you subset the right GeoDataFrame before joining
  • an index_right column showing the matched feature index from the right layer
print(joined.columns)
print(joined.head())

If both layers contain the same column name, GeoPandas adds suffixes to distinguish them.

Save the joined result

Export the result to a new file.

joined.to_file("output/customer_points_with_districts.gpkg", driver="GPKG")

Other formats:

joined.to_file("output/customer_points_with_districts.shp")
joined.to_file("output/customer_points_with_districts.geojson", driver="GeoJSON")

Shapefiles have stricter limits on field names and data types than GeoPackage.

Code examples

Example 1: Join points to polygons

Assign each customer point to a city district.

import geopandas as gpd

customers = gpd.read_file("data/customer_locations.geojson")
districts = gpd.read_file("data/city_districts.gpkg")

if customers.crs != districts.crs:
    districts = districts.to_crs(customers.crs)

result = gpd.sjoin(
    customers,
    districts[["district_name", "geometry"]],
    how="left",
    predicate="within"
)

print(result[["customer_id", "district_name"]].head())

Example 2: Keep all points even when no match is found

Use a left join so unmatched points remain in the output.

result = gpd.sjoin(
    customers,
    districts[["district_name", "geometry"]],
    how="left",
    predicate="within"
)

unmatched = result[result["district_name"].isna()]
print(f"Unmatched points: {len(unmatched)}")

Unmatched features will have NaN in the joined columns.

Example 3: Join overlapping polygons

Join flood zones to parcel polygons where they intersect.

import geopandas as gpd

parcels = gpd.read_file("data/parcels.shp")
flood_zones = gpd.read_file("data/flood_zones.shp")

if parcels.crs != flood_zones.crs:
    flood_zones = flood_zones.to_crs(parcels.crs)

parcel_flood = gpd.sjoin(
    parcels,
    flood_zones[["zone_type", "geometry"]],
    how="inner",
    predicate="intersects"
)

print(parcel_flood[["parcel_id", "zone_type"]].head())

One parcel can produce multiple rows if it intersects more than one flood polygon.

Example 4: Reproject before joining

Fix a CRS mismatch before running the join.

import geopandas as gpd

sites = gpd.read_file("data/sites.geojson")
boundaries = gpd.read_file("data/admin_boundaries.shp")

print(sites.crs)
print(boundaries.crs)

if sites.crs != boundaries.crs:
    boundaries = boundaries.to_crs(sites.crs)

joined = gpd.sjoin(
    sites,
    boundaries[["admin_name", "geometry"]],
    how="left",
    predicate="within"
)

print(joined.head())

Explanation

A spatial join is different from a normal attribute join.

  • An attribute join uses a shared key such as parcel_id
  • A spatial join uses geometry relationships such as overlap, containment, or intersection

In GeoPandas, sjoin() compares the geometry in the left GeoDataFrame with the geometry in the right GeoDataFrame using the predicate you choose.

Examples include:

  • point within polygon
  • polygon intersects polygon
  • line intersects polygon

GeoPandas uses spatial indexing when available, which reduces the number of geometry comparisons and helps performance on larger datasets.

A spatial join can also increase row counts. One feature may match multiple features, especially when:

  • polygons overlap
  • a line crosses several polygons
  • boundary features match more than one geometry
  • source administrative boundaries or zone layers are not cleanly non-overlapping

That is why the output may contain more rows than the left input layer.

Edge cases and notes

Duplicate matches

One feature can match more than one feature.

This often happens with:

  • overlapping polygon layers
  • polygon-to-polygon joins
  • line-to-polygon joins
  • boundary cases depending on the predicate

If you need exactly one match per feature, you will need additional cleanup logic after the join.

CRS mismatches

Always check CRS before every spatial join.

print(left_gdf.crs)
print(right_gdf.crs)

If needed:

right_gdf = right_gdf.to_crs(left_gdf.crs)

Missing or invalid geometries

Null or invalid geometries can cause missing matches or errors.

left_gdf = left_gdf[left_gdf.geometry.notna()]
right_gdf = right_gdf[right_gdf.geometry.notna()]

right_gdf = right_gdf[right_gdf.is_valid]

In some workflows, you may need to repair invalid geometry before joining.

Boundary behavior

Points or lines on polygon boundaries can produce unexpected results.

  • within requires the left geometry to be in the interior of the right geometry
  • contains is also strict and does not treat boundary-only contact as contained
  • intersects is more inclusive and matches touching geometries

If features on edges are not matching as expected, test intersects and compare the output.

Performance on large datasets

Large joins can be slow.

To reduce overhead:

  • keep only the columns you need before joining
  • make sure a spatial index is available
  • filter either dataset first if you can reduce the search area
  • avoid unnecessary reprojection or repeated file reads inside loops

If you need the underlying concept first, read What Is a Spatial Join in GIS?.

Related tasks:

If your join returns empty or unexpected output, see Why Is My GeoPandas Spatial Join Returning No Results?.

FAQ

How do I do a spatial join in GeoPandas?

Use gpd.sjoin(left_gdf, right_gdf, how=..., predicate=...). Both inputs must be GeoDataFrames and both must use the same CRS.

joined = gpd.sjoin(points, polygons, how="left", predicate="within")

What is the difference between intersects, within, and contains in a spatial join?

  • intersects: geometries touch or overlap in any way
  • within: left geometry is inside the right geometry
  • contains: left geometry contains the right geometry

For joining points to polygons, within is usually the best starting point.

Why does my spatial join return empty results?

Common causes include:

  • CRS mismatch
  • wrong predicate
  • invalid or null geometries
  • empty input layers

Check CRS first, then inspect geometry validity and test a different predicate.

Why do I get multiple rows for one feature after a spatial join?

Because one feature can match multiple features in the other layer. This is common with overlapping polygons and polygon intersection joins.