How to Fix Invalid Geometries in Python (GeoPandas)

How to find and fix invalid geometries in GeoPandas using buffer(0), make_valid, and geometry validation checks.

Problem statement

A common GIS data problem is a vector layer that loads successfully but fails during later processing because some features have invalid geometry.

This usually shows up when you try to run:

  • overlays
  • spatial joins
  • dissolves
  • clipping
  • exports to file
  • imports into PostGIS or another database

Typical signs include:

  • TopologyException errors
  • self-intersecting polygons
  • empty or broken output
  • failed writes to shapefile or GeoJSON
  • unexpected spatial operation results

If you are working with a shapefile or GeoJSON in GeoPandas, the practical fix is to detect invalid features, inspect the reason, repair them, and verify the result before continuing.

Quick answer

The shortest reliable workflow to fix invalid geometries in GeoPandas is:

import geopandas as gpd
from shapely import make_valid

gdf = gpd.read_file("data/parcels.shp")

invalid_mask = ~gdf.geometry.is_valid
gdf.loc[invalid_mask, "geometry"] = gdf.loc[invalid_mask, "geometry"].apply(make_valid)

gdf = gdf[gdf.geometry.notna()]
gdf = gdf[~gdf.geometry.is_empty]
gdf = gdf[gdf.geometry.is_valid]

gdf.to_file("data/parcels_cleaned.gpkg", driver="GPKG")

Use make_valid() first. Use buffer(0) only as a fallback for simple polygon issues. Always re-check validity before saving.

Step-by-step solution

Load the vector dataset into GeoPandas

Start by reading the source file into a GeoDataFrame.

import geopandas as gpd

gdf = gpd.read_file("data/parcels.shp")

print(gdf.crs)
print(gdf.geometry.name)
print(len(gdf))

This works the same way for GeoJSON:

gdf = gpd.read_file("data/parcels.geojson")

If you plan to compare or combine this layer with others later, keep the CRS unchanged or reproject explicitly with to_crs().

Find invalid geometries

Use .is_valid on the geometry column to identify invalid features.

gdf["is_valid"] = gdf.geometry.is_valid

invalid_gdf = gdf[~gdf["is_valid"]]

print("Total features:", len(gdf))
print("Invalid features:", len(invalid_gdf))

If you only want the bad rows for review:

print(invalid_gdf[["is_valid", "geometry"]].head())

This is usually the first check to run before overlay, dissolve, or export operations.

Check why each geometry is invalid

Finding invalid rows is useful, but you often also need the reason. Shapely provides explain_validity() for this.

from shapely.validation import explain_validity

invalid_gdf = invalid_gdf.copy()
invalid_gdf["validity_reason"] = invalid_gdf.geometry.apply(explain_validity)

print(invalid_gdf[["validity_reason"]].head(10))

Typical messages include:

  • Self-intersection
  • Ring Self-intersection
  • Too few points in geometry component

This helps you understand whether you are dealing with a few isolated errors or a broader data-quality problem.

Repair geometries with make_valid()

The preferred repair method is make_valid() from Shapely.

from shapely import make_valid

This requires a modern Shapely version that provides make_valid(). In older environments, use buffer(0) only as a limited fallback and consider upgrading Shapely.

Apply it only to invalid rows:

gdf_repaired = gdf.copy()

invalid_mask = ~gdf_repaired.geometry.is_valid

gdf_repaired.loc[invalid_mask, "geometry"] = (
    gdf_repaired.loc[invalid_mask, "geometry"].apply(make_valid)
)

Then check the result:

gdf_repaired["is_valid_after"] = gdf_repaired.geometry.is_valid

print("Still invalid:", (~gdf_repaired["is_valid_after"]).sum())
print(gdf_repaired.geometry.geom_type.value_counts())

This is the best default option in current GeoPandas and Shapely workflows.

Use buffer(0) as a fallback for simple polygon issues

If make_valid() is not available in your environment, or if you are testing a simple self-intersection fix, buffer(0) can help.

gdf_buffered = gdf.copy()

invalid_mask = ~gdf_buffered.geometry.is_valid

gdf_buffered.loc[invalid_mask, "geometry"] = (
    gdf_buffered.loc[invalid_mask, "geometry"].buffer(0)
)

Re-check validity:

print("Still invalid after buffer(0):", (~gdf_buffered.geometry.is_valid).sum())

Use this carefully. It can change geometry shape or remove small artifacts. It is not the best default repair method.

Remove empty or still-invalid results

Some repaired geometries may become empty, null, or remain invalid.

cleaned = gdf_repaired.copy()

cleaned = cleaned[cleaned.geometry.notna()]
cleaned = cleaned[~cleaned.geometry.is_empty]
cleaned = cleaned[cleaned.geometry.is_valid]

print("Remaining features:", len(cleaned))

If downstream tools require polygon-only output, inspect geometry types after repair:

print(cleaned.geometry.geom_type.value_counts())

A simple polygon-only filter is:

polygon_types = ["Polygon", "MultiPolygon"]
cleaned_polygons = cleaned[cleaned.geometry.geom_type.isin(polygon_types)].copy()

print(cleaned_polygons.geometry.geom_type.value_counts())

This is useful when make_valid() returns GeometryCollection or other non-polygon results.

Verify repaired geometries before continuing

Before running overlay, dissolve, or export steps, confirm the repair worked.

print("Original feature count:", len(gdf))
print("Cleaned feature count:", len(cleaned))
print("Invalid after repair:", (~cleaned.geometry.is_valid).sum())
print(cleaned.geometry.geom_type.value_counts())

This matters because make_valid() may split one invalid polygon into multiple valid parts or return a different geometry type.

Save the cleaned dataset

Write the cleaned layer to an output format after checking the geometry types you now have.

cleaned.to_file("data/parcels_cleaned.gpkg", driver="GPKG")

You can also save to GeoJSON:

cleaned.to_file("data/parcels_cleaned.geojson", driver="GeoJSON")

If you save to shapefile, be aware that it is more restrictive:

cleaned.to_file("data/parcels_cleaned.shp")

GeoPackage is often the safer choice than shapefile when repaired features become multipart. If repair produces mixed geometry types such as GeometryCollection, inspect and filter geometry types before saving.

Code examples

Example 1: Detect invalid geometries in a shapefile

import geopandas as gpd

gdf = gpd.read_file("data/buildings.shp")
gdf["is_valid"] = gdf.geometry.is_valid

invalid_gdf = gdf[~gdf["is_valid"]]

print("Invalid features:", len(invalid_gdf))
print(invalid_gdf[["is_valid"]].head())

Example 2: Print validation reasons for bad features

from shapely.validation import explain_validity

invalid_gdf = invalid_gdf.copy()
invalid_gdf["reason"] = invalid_gdf.geometry.apply(explain_validity)

for idx, row in invalid_gdf[["reason"]].head(10).iterrows():
    print(idx, row["reason"])

Example 3: Repair geometries with make_valid()

from shapely import make_valid

gdf_fixed = gdf.copy()
mask = ~gdf_fixed.geometry.is_valid

gdf_fixed.loc[mask, "geometry"] = gdf_fixed.loc[mask, "geometry"].apply(make_valid)

print("Invalid after repair:", (~gdf_fixed.geometry.is_valid).sum())
print(gdf_fixed.geometry.geom_type.value_counts())

Example 4: Keep only polygon results after repair

polygon_types = ["Polygon", "MultiPolygon"]

gdf_polygons = gdf_fixed[gdf_fixed.geometry.geom_type.isin(polygon_types)].copy()

print(gdf_polygons.geometry.geom_type.value_counts())

Example 5: Drop empty geometries and save cleaned output

cleaned = gdf_fixed.copy()
cleaned = cleaned[cleaned.geometry.notna()]
cleaned = cleaned[~cleaned.geometry.is_empty]
cleaned = cleaned[cleaned.geometry.is_valid]

cleaned.to_file("data/buildings_cleaned.gpkg", driver="GPKG")

Explanation

In practical GIS work, an invalid geometry is a feature that does not meet the structural rules required by geometry libraries. For polygons, common problems include self-intersections, ring issues, or broken parts.

These problems matter because invalid features often break spatial processing before export, especially during:

  • overlays
  • clipping
  • dissolves
  • spatial joins

The repair workflow has three separate steps:

  1. Check validity with .is_valid
  2. Diagnose the reason with explain_validity()
  3. Repair with make_valid() or, if necessary, buffer(0)

GeoPandas relies on Shapely for these geometry operations, so the repair behavior comes from Shapely.

A key point is that repair can change the geometry structure. One invalid polygon may become:

  • a MultiPolygon
  • a GeometryCollection
  • a different valid shape than the original

That is why checking validity is not enough. You should also inspect geom_type after repair and filter the output if your workflow expects only polygons, only lines, or a single geometry type for export.

Edge cases or notes

make_valid() may return different geometry types

A repaired polygon layer may no longer contain only polygons. If your downstream workflow expects polygon-only features, inspect geom_type and filter before saving.

Shapefiles are restrictive after repair

Shapefiles are restrictive. Multipart geometries may be fine if they match the layer type, but mixed geometry types after repair can cause problems. Save to GeoPackage first and inspect geom_type before exporting to stricter formats.

Some invalid geometries will not repair cleanly

A few features may remain invalid or become empty after repair. In those cases, inspect them manually or remove them from the dataset.

buffer(0) is not a universal fix

It can work for simple polygon self-intersections, but it can also alter geometry shape. Do not use it as the default method if make_valid() is available.

CRS still matters

CRS does not determine whether a geometry is valid, but it still matters for the rest of your workflow. Before overlay or spatial join operations, make sure all layers use the same CRS.

Valid does not always mean correct

A geometry can be technically valid and still be wrong for the dataset. Geometry repair fixes structural issues, not mapping mistakes or attribute problems.

For background, see Geometry validity in GIS: what makes a feature invalid?

Related task pages:

If your processing fails before repair, see Why GeoPandas Overlay Fails with TopologyException

FAQ

What is the best way to fix invalid geometries in GeoPandas?

Use make_valid() from Shapely when available. It is the preferred method for most cases. Use buffer(0) only as a fallback for simple polygon issues.

Why does make_valid() change my polygon into multiple parts?

Because the original shape may be structurally broken. Repairing it can split overlapping or self-intersecting areas into valid separate geometries such as MultiPolygon or GeometryCollection.

Can I use buffer(0) to repair all invalid geometries?

No. It sometimes works for simple self-intersections, but it can change geometry shape and does not fix every invalid case reliably.

What should I do if repaired geometries become empty or still invalid?

Filter out null, empty, or still-invalid rows and inspect those features manually if they matter to the project. Save the cleaned output in a flexible format such as GeoPackage.