Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 24 additions & 6 deletions spec/stac-geoparquet-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,11 @@ most of the fields should be the same in STAC and in GeoParquet.
| _property columns_ | _varies_ | - | Each property should use the relevant Parquet type, and be pulled out of the properties object to be a top-level Parquet field |

- Must be valid GeoParquet, with proper metadata. Ideally the geometry types are defined and as narrow as possible.
- Strongly recommend to only have one GeoParquet per STAC 'Collection'. Not doing this will lead to an expanded GeoParquet schema (the union of all the schemas of the collection) with lots of empty data
- Recommend to only have one GeoParquet per STAC 'Collection'. Not doing this will lead to an expanded GeoParquet schema (the union of all the schemas of the collection) with lots of empty data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rephrase this as strongly recommending that the records be somewhat uniform? That gets to the core if the issue (avoiding a bloated schema, lean on the strengths of parquet), and whether this comes from one or many collections is secondary. So maybe something like

  • Strongly recommend that the records be mostly uniform, either because they came from a single STAC collection or multiple STAC collections whose items have similar fields. Apache Parquet is a columnar format, and is most efficient and convenient for users when all the records have the same fields. Not doing this will lead to an expanded GeoParquet schema (the union of all the schemas of the collection) with lots of empty data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And does it make sense to mention the fields extension here?

- Any field in 'properties' of the STAC item should be moved up to be a top-level field in the GeoParquet.
- STAC GeoParquet does not support properties that are named such that they collide with a top-level key.
- datetime columns should be stored as a [native timestamp][timestamp], not as a string
- The Collection JSON should be included in the Parquet metadata. See [Collection JSON](#including-a-stac-collection-json-in-a-stac-geoparquet-collection) below.
- The Collection(s) JSON should be included in the Parquet metadata. See [Collection JSON](#including-one-or-more-stac-collection-json-in-a-stac-geoparquet-collection) below.
- Any other properties that would be stored as GeoJSON in a STAC JSON Item (e.g. `proj:geometry`) should be stored as a binary column with WKB encoding. This simplifies the handling of collections with multiple geometry types.

### Link Struct
Expand Down Expand Up @@ -69,12 +69,30 @@ To take advantage of Parquet's columnar nature and compression, the assets shoul

See [Asset Object][asset] for more.

## Including a STAC Collection JSON in a STAC Geoparquet Collection
## Including one or more STAC Collection JSON in a STAC Geoparquet Collection

To make a stac-geoparquet file a fully self-contained representation, you can
include the Collection JSON in the Parquet metadata. If present in the [Parquet
file metadata][parquet-metadata], the key must be `stac:collection` and the
value must be a JSON string with the Collection JSON.
include one or more Collection JSON in the Parquet metadata. If present in the [Parquet
file metadata][parquet-metadata], the key must be `stac:collections` and the
value must be a JSON string with the Collections JSON as an object, keyed by the collection id.

Abbreviated example in JSON form:

```json
{
"stac:collections": "{\"collection-a\":{\"id\":\"collection-a\",...}}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've confused myself, but do we need the escaping here? I guess the text does say that the value of stac:collections is a JSON string, so I guess this does look correct.

}
```

### Deprecations in v1.1

Prior to stac-geoparquet v1.1, this specification recommended storing a single STAC collection in a `stac:collection` metadata field.
If possible, clients should continue to support this field (with a warning) for backwards compatibility.
**If both `stac:collection` and `stac:collections` are present in the stac-geoparquet metadata, it is an error.**

`stac:collection` will be removed from this specification in the next breaking release.

See [this RFC](https://github.com/stac-utils/stac-geoparquet/issues/88) for details.

## Referencing a STAC Geoparquet Collections in a STAC Collection JSON

Expand Down