Overlay Operations in GeoPandas: Union, Intersection, Difference Explained
A common GIS task is comparing two polygon layers to answer questions like:
Problem statement
A common GIS task is comparing two polygon layers to answer questions like:
- Which parcels fall inside a flood zone?
- What areas are covered by either of two planning layers?
- What land remains after removing protected areas?
This is where GeoPandas overlay operations are useful. They create a new layer by cutting and combining the boundaries from two inputs.
This is also where users often get stuck. overlay() is not the same as:
sjoin(), which matches features based on a spatial relationshipclip(), which trims one layer to a boundary- direct Shapely geometry methods used on single geometries
This page focuses on choosing the correct geopandas.overlay() operation and using it correctly for polygon and multipolygon analysis.
Quick answer
Use geopandas.overlay() when you need a new layer built from two existing layers by cutting and combining their boundaries. In this page, the focus is polygon and multipolygon workflows.
how="intersection"keeps only overlapping areashow="union"keeps all areas from both layers and splits polygons where they overlaphow="difference"keeps only the parts of the first layer that are not covered by the second
In most cases, both layers should:
- use the same CRS
- contain polygon or multipolygon geometries
- have valid geometries
Step-by-step solution
Step 1: Load the two spatial layers
This example uses:
parcels.gpkg: land parcelsflood_zones.gpkg: flood hazard polygons
import geopandas as gpd
parcels = gpd.read_file("data/parcels.gpkg")
flood_zones = gpd.read_file("data/flood_zones.gpkg")
print(parcels.head())
print(flood_zones.head())
For the workflows in this page, both inputs should usually be polygon or multipolygon layers.
Step 2: Check CRS before overlay
Overlay analysis should be done in the same coordinate reference system.
print(parcels.crs)
print(flood_zones.crs)
If they do not match, reproject one layer:
flood_zones = flood_zones.to_crs(parcels.crs)
If area or distance matters, use a projected CRS rather than a geographic CRS such as EPSG:4326.
parcels = parcels.to_crs("EPSG:26917")
flood_zones = flood_zones.to_crs("EPSG:26917")
Step 3: Inspect geometry types and validity
Overlay works best when both layers contain valid polygon geometries.
print(parcels.geom_type.unique())
print(flood_zones.geom_type.unique())
print((~parcels.geometry.is_valid).sum())
print((~flood_zones.geometry.is_valid).sum())
Remove empty geometries if needed:
parcels = parcels[~parcels.geometry.is_empty & parcels.geometry.notnull()].copy()
flood_zones = flood_zones[~flood_zones.geometry.is_empty & flood_zones.geometry.notnull()].copy()
If invalid geometries are present, repair them before overlay:
parcels["geometry"] = parcels.geometry.make_valid()
flood_zones["geometry"] = flood_zones.geometry.make_valid()
Note: .make_valid() requires a GeoPandas/Shapely stack that supports it. If your environment does not, use a version-compatible geometry repair workflow before running overlay.
Step 4: Run geopandas.overlay() with how="intersection"
Use intersection when you only want areas shared by both layers.
parcel_flood_overlap = gpd.overlay(parcels, flood_zones, how="intersection")
print(parcel_flood_overlap.head())
print(parcel_flood_overlap.columns)
print(len(parcel_flood_overlap))
This is a common workflow for finding the parts of parcels that lie inside hazard zones.
Step 5: Run geopandas.overlay() with how="union"
Use union when you need all coverage from both layers, split into unique pieces.
parcel_flood_union = gpd.overlay(parcels, flood_zones, how="union")
print(parcel_flood_union.head())
print(parcel_flood_union.columns)
print(len(parcel_flood_union))
This keeps:
- overlap areas
- parcel-only areas
- flood-zone-only areas
The output is often split into many smaller polygons because GeoPandas cuts the layer wherever boundaries intersect.
Step 6: Run geopandas.overlay() with how="difference"
Use difference when you want the first layer minus the second layer.
parcels_outside_flood = gpd.overlay(parcels, flood_zones, how="difference")
print(parcels_outside_flood.head())
print(len(parcels_outside_flood))
Order matters here:
overlay(parcels, flood_zones, how="difference")means parcels minus flood zonesoverlay(flood_zones, parcels, how="difference")means flood zones minus parcels
These are different results.
Step 7: Review the output attributes
Overlay outputs combine columns from both inputs.
For intersection:
- output features contain attributes from both parcels and flood zones
For union:
- some polygons come only from one layer
- fields from the other layer will be
NaN
print(parcel_flood_union.dtypes)
print(parcel_flood_union[["parcel_id", "zone_type"]].head(10))
If both inputs have a field with the same name, GeoPandas will suffix them to avoid collisions.
Step 8: Save the result to a new file
GeoPackage is usually the safest format for modern workflows.
parcel_flood_overlap.to_file(
"output/parcel_flood_overlap.gpkg",
layer="intersection",
driver="GPKG"
)
parcel_flood_union.to_file(
"output/parcel_flood_union.gpkg",
layer="union",
driver="GPKG"
)
parcels_outside_flood.to_file(
"output/parcels_outside_flood.gpkg",
layer="difference",
driver="GPKG"
)
You can also export to Shapefile or GeoJSON, but GeoPackage handles field names and schema more reliably.
Code examples
Example 1: Overlay two polygon layers with intersection
import geopandas as gpd
parcels = gpd.read_file("data/parcels.gpkg").to_crs("EPSG:26917")
flood_zones = gpd.read_file("data/flood_zones.gpkg").to_crs("EPSG:26917")
parcels["geometry"] = parcels.geometry.make_valid()
flood_zones["geometry"] = flood_zones.geometry.make_valid()
result = gpd.overlay(parcels, flood_zones, how="intersection")
print(result.shape)
print(result.columns.tolist())
Example 2: Use union to split and combine all coverage
import geopandas as gpd
zoning = gpd.read_file("data/zoning.gpkg")
constraints = gpd.read_file("data/environmental_constraints.gpkg").to_crs(zoning.crs)
zoning["geometry"] = zoning.geometry.make_valid()
constraints["geometry"] = constraints.geometry.make_valid()
combined = gpd.overlay(zoning, constraints, how="union")
print(combined.head())
Example 3: Use difference to remove excluded areas
import geopandas as gpd
development = gpd.read_file("data/development_area.gpkg")
protected = gpd.read_file("data/protected_areas.gpkg").to_crs(development.crs)
development["geometry"] = development.geometry.make_valid()
protected["geometry"] = protected.geometry.make_valid()
eligible_land = gpd.overlay(development, protected, how="difference")
print(eligible_land.head())
Example 4: Export overlay output
import geopandas as gpd
eligible_land.to_file(
"output/eligible_land.gpkg",
layer="eligible_land",
driver="GPKG"
)
check = gpd.read_file("output/eligible_land.gpkg", layer="eligible_land")
print(check.crs)
print(check.geom_type.unique())
print(check.columns.tolist())
Explanation
What overlay does compared to clip and spatial join
Overlay creates new geometries by cutting and combining polygon boundaries from both layers.
Use clip() when you want to trim one layer to a mask boundary, but you do not need attributes from both layers.
Use sjoin() when you want to attach attributes based on spatial relationships like intersects or within, without cutting the geometries.
Use overlay() when the geometry itself must change as part of the analysis.
When to use intersection
Use intersection when only shared areas matter.
Example:
- parcel area inside a flood zone
- habitat overlap with zoning districts
This is often the right choice for regulated-area analysis.
When to use union
Use union when you need a complete partition of space from both layers.
Example:
- combining zoning and environmental policy layers
- preparing polygons for area summaries by all combinations of attributes
This creates the most detailed output and often the most features.
When to use difference
Use difference when removing one layer from another.
Example:
- buildable land minus wetlands
- service area minus excluded zones
The first input defines what you are keeping.
Why input order matters
Input order matters most for difference, because A - B is not the same as B - A.
In practice, define the first layer as the one you want to keep or analyze, and the second as the layer to compare against or subtract.
Edge cases or notes
Polygon scope for this page
This page focuses on polygon and multipolygon overlays. If your inputs include mixed geometry types such as points or lines, handle those workflows separately.
Invalid geometries
Self-intersections, duplicate rings, and broken polygons can cause overlay failures or incorrect output. Validate and repair geometries first.
If .make_valid() is not available in your environment, use a geometry repair method supported by your GeoPandas and Shapely versions before running overlay.
CRS mismatch
If the CRS does not match, results may be wrong or the operation may fail. Always align CRS before overlay. If area is important, use a projected CRS.
Performance on large datasets
Overlay can be slow and memory-intensive for large polygon layers. If possible:
- filter to the study area first
- keep only necessary columns
- process in smaller subsets
Missing values in union output
With how="union", some output polygons exist only in one input layer. Those features will have null values for fields from the other layer. This is expected.
Internal links
For a broader overview of related tools, see GeoPandas Basics: Working with Spatial Data in Python.
If you need background on GIS workflows in Python, see Python for GIS: What It Is and When to Use It.
If you need to review projection handling before overlay, see Coordinate Reference Systems (CRS) Explained for Python GIS.
If your overlay fails because of broken polygons, see How to Fix Invalid Geometries Before GeoPandas Analysis.
FAQ
What is the difference between overlay() and sjoin() in GeoPandas?
overlay() creates new geometries by cutting and combining boundaries. sjoin() keeps the original geometries and only transfers attributes based on spatial relationships.
When should I use intersection instead of clip?
Use intersection when you want attributes from both layers and need the true overlap polygons. Use clip when you only want to trim one layer to a mask boundary.
Why does difference return unexpected results?
The most common reasons are:
- the inputs are in different CRS
- geometries are invalid
- input order is reversed
Remember that difference keeps only the parts of the first layer outside the second.
Do both layers need the same CRS for GeoPandas overlay operations?
Yes. In practice, both layers should be in the same CRS before running overlay. Reproject one layer with .to_crs() if needed.
Related articles
Keep exploring with more guides in this category.
Coordinate Reference Systems (CRS) Explained for Python GIS
Understand coordinate reference systems in Python GIS and learn how to check, assign, and convert them with GeoPandas.
Read article →
GeoPandas Basics: Working with Spatial Data in Python
A practical introduction to GeoPandas for reading, inspecting, filtering, and exporting vector spatial data in Python.
Read article →
Python for GIS: What It Is and When to Use It
Learn what Python for GIS means, when to use it, and which tools to start with for automating spatial workflows.
Read article →