-
-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Relations are broken sometimes; it's just a fact of life in this data model (. We should decide what we want to do about (if anything) that and document it.
I don't know precisely what constraints shapely, WKB encoders, our geoparquet encoder, etc. impose, but I know that the PyOsmium reader will throw an exception in most cases when encountering an invalid geometry. I don't think anyone wants the geoparquet files to contain invalid geometries (e.g. non-closed, self-intersecting, etc.... I think WKB would theoretically allow this), but there are a few ways we could proceed I guess:
- Store null geometries (as in a proper null value; not a null island point), but keep all the other tags?
- Output a second set of files for rejects (e.g.
boundaries-rejected.parquet) which has the same schema but doesn't have a geometry column (and is thus not actually "geo" parquet anymore)? We could optionally include a reason here too.
I think right now we probably just drop them, which is fine for some use cases, but I think other consumers would also like to know from a quality perspective what is currently broken. This could actually have other uses too in the community ;)
I lean toward the "reject file" option but happy to hear other ideas.