- 
                Notifications
    You must be signed in to change notification settings 
- Fork 15
          feat: add stac:collections to spec
          #89
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have versioning on the stac-geoparquet spec ... we probably should? Maybe just a note in the README and a tag?
Whoops. What do you think about putting a stac-geoparquet version identifier in the parquet file metadata? Maybe discuss that separately.
what do we pick
1.1 seems fine.
In general, it's a little awkward to have the spec coupled to the code here
Happy to move stuff (code or spec) as desired.
|  | ||
| - Must be valid GeoParquet, with proper metadata. Ideally the geometry types are defined and as narrow as possible. | ||
| - Strongly recommend to only have one GeoParquet per STAC 'Collection'. Not doing this will lead to an expanded GeoParquet schema (the union of all the schemas of the collection) with lots of empty data | ||
| - Recommend to only have one GeoParquet per STAC 'Collection'. Not doing this will lead to an expanded GeoParquet schema (the union of all the schemas of the collection) with lots of empty data | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rephrase this as strongly recommending that the records be somewhat uniform? That gets to the core if the issue (avoiding a bloated schema, lean on the strengths of parquet), and whether this comes from one or many collections is secondary. So maybe something like
- Strongly recommend that the records be mostly uniform, either because they came from a single STAC collection or multiple STAC collections whose items have similar fields. Apache Parquet is a columnar format, and is most efficient and convenient for users when all the records have the same fields. Not doing this will lead to an expanded GeoParquet schema (the union of all the schemas of the collection) with lots of empty data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And does it make sense to mention the fields extension here?
|  | ||
| ```json | ||
| { | ||
| "stac:collections": "{\"collection-a\":{\"id\":\"collection-a\",...}}" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've confused myself, but do we need the escaping here? I guess the text does say that the value of stac:collections is a JSON string, so I guess this does look correct.
This updates the file metadata to 1. Add a version (currently 1.0) 2. Add `stac:collections` 3. Deprecate `stac:collection` 4. Add a jsonschema file Supercedes stac-utils#89
This updates the file metadata to 1. Add a version (currently 1.0) 2. Add `stac:collections` 3. Deprecate `stac:collection` 4. Add a jsonschema file Supercedes stac-utils#89
Some words to implement #88. Opening as a draft while we resolve a couple of issues:
cc @bitner