Skip to content

Decide + document how layercake handles invalid geometries #9

@ianthetechie

Description

@ianthetechie

Relations are broken sometimes; it's just a fact of life in this data model (. We should decide what we want to do about (if anything) that and document it.

I don't know precisely what constraints shapely, WKB encoders, our geoparquet encoder, etc. impose, but I know that the PyOsmium reader will throw an exception in most cases when encountering an invalid geometry. I don't think anyone wants the geoparquet files to contain invalid geometries (e.g. non-closed, self-intersecting, etc.... I think WKB would theoretically allow this), but there are a few ways we could proceed I guess:

  • Store null geometries (as in a proper null value; not a null island point), but keep all the other tags?
  • Output a second set of files for rejects (e.g. boundaries-rejected.parquet) which has the same schema but doesn't have a geometry column (and is thus not actually "geo" parquet anymore)? We could optionally include a reason here too.

I think right now we probably just drop them, which is fine for some use cases, but I think other consumers would also like to know from a quality perspective what is currently broken. This could actually have other uses too in the community ;)

I lean toward the "reject file" option but happy to hear other ideas.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions