Skip to content

Conversation

@gadomski
Copy link
Member

@gadomski gadomski commented Mar 6, 2025

Some words to implement #88. Opening as a draft while we resolve a couple of issues:

  • We don't have versioning on the stac-geoparquet spec ... we probably should? Maybe just a note in the README and a tag?
  • If we throw a version on, what do we pick? While I wrote v1.1 in this PR, I could be talking into a pre-1 version as well.
  • In general, it's a little awkward to have the spec coupled to the code here ... but I'm not sure the easy path to resolve that in a non-confusing way for the community.

cc @bitner

@gadomski gadomski requested review from TomAugspurger and m-mohr March 6, 2025 13:35
@gadomski gadomski linked an issue Mar 6, 2025 that may be closed by this pull request
Copy link
Collaborator

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have versioning on the stac-geoparquet spec ... we probably should? Maybe just a note in the README and a tag?

Whoops. What do you think about putting a stac-geoparquet version identifier in the parquet file metadata? Maybe discuss that separately.

what do we pick

1.1 seems fine.

In general, it's a little awkward to have the spec coupled to the code here

Happy to move stuff (code or spec) as desired.


- Must be valid GeoParquet, with proper metadata. Ideally the geometry types are defined and as narrow as possible.
- Strongly recommend to only have one GeoParquet per STAC 'Collection'. Not doing this will lead to an expanded GeoParquet schema (the union of all the schemas of the collection) with lots of empty data
- Recommend to only have one GeoParquet per STAC 'Collection'. Not doing this will lead to an expanded GeoParquet schema (the union of all the schemas of the collection) with lots of empty data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rephrase this as strongly recommending that the records be somewhat uniform? That gets to the core if the issue (avoiding a bloated schema, lean on the strengths of parquet), and whether this comes from one or many collections is secondary. So maybe something like

  • Strongly recommend that the records be mostly uniform, either because they came from a single STAC collection or multiple STAC collections whose items have similar fields. Apache Parquet is a columnar format, and is most efficient and convenient for users when all the records have the same fields. Not doing this will lead to an expanded GeoParquet schema (the union of all the schemas of the collection) with lots of empty data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And does it make sense to mention the fields extension here?


```json
{
"stac:collections": "{\"collection-a\":{\"id\":\"collection-a\",...}}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've confused myself, but do we need the escaping here? I guess the text does say that the value of stac:collections is a JSON string, so I guess this does look correct.

TomAugspurger added a commit to TomAugspurger/stac-geoparquet that referenced this pull request May 2, 2025
This updates the file metadata to

1. Add a version (currently 1.0)
2. Add `stac:collections`
3. Deprecate `stac:collection`
4. Add a jsonschema file

Supercedes stac-utils#89
TomAugspurger added a commit to TomAugspurger/stac-geoparquet that referenced this pull request May 2, 2025
This updates the file metadata to

1. Add a version (currently 1.0)
2. Add `stac:collections`
3. Deprecate `stac:collection`
4. Add a jsonschema file

Supercedes stac-utils#89
@m-mohr m-mohr removed their request for review June 22, 2025 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] Multiple collections in metadata

2 participants