-
Notifications
You must be signed in to change notification settings - Fork 15
Clarify terminology across specification #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
5f8c6fb
bb05f3f
561edd9
f99d742
b8c988b
8cac80c
08caa63
4db26fb
1500a6d
201ae3e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,26 @@ | |
|
||
The GeoZarr Unified Data Model and Encoding Standard defines a conceptual and implementation framework for representing and encoding geospatial and scientific datasets using the Zarr format. The scope of this Standard includes the definition of a format-agnostic unified data model, the specification of its encoding into Zarr Version 2 and Version 3, and the establishment of extension points to support interoperability with external metadata and tiling standards. | ||
|
||
This Standard addresses the needs of Earth observation, environmental monitoring, and geospatial analysis applications that require efficient, scalable access to multidimensional datasets. It enables the harmonisation of existing data models, such as the Unidata Common Data Model (CDM) and the Climate and Forecast (CF) Conventions, with operational encoding formats suitable for cloud-native storage and analysis. | ||
These capabilities are necessary because Zarr does not provide semantic constructs for geospatial data interpretation. Applications need to understand not just array shapes and values, but coordinate meanings, projection parameters, and scientific metadata. GeoZarr fills this gap without compromising Zarr's performance characteristics. | ||
|
||
Typical use cases include the storage, transformation, discovery, and processing of raster and gridded data, data cubes with temporal or vertical dimensions, and catalogue-enabled datasets integrated with metadata standards such as STAC and OGC Tile Matrix Sets. | ||
=== Why GeoZarr Exists | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we may be missing an important clarification to justify the purpose of Geozarr: There are already existing conventions for geospatial data in Zarr, as implemented in Xarray, NCZarr, GDAL, those conventions primarily translate aspects of the CF/NetCDF data model into Zarr encoding. However:
|
||
|
||
Zarr, by design, is a low-level container for storing n-dimensional arrays and metadata. While this simplicity is a strength for performance and interoperability, it means Zarr lacks higher-level concepts that geospatial applications require: | ||
|
||
* *Coordinate Systems:* No native way to associate spatial or temporal meaning with array dimensions | ||
* *Grid Mappings:* No standard mechanism for projection and coordinate reference system metadata | ||
* *Semantic Metadata:* No conventions for units, standard names, or scientific attributes | ||
* *Variable Relationships:* No formal distinction between coordinate variables and data variables | ||
|
||
These concepts are essential for geospatial workflows but must be layered on top of Zarr's array storage. GeoZarr provides this semantic layer through proven standards (Common Data Model and CF conventions) while preserving Zarr's cloud-native advantages. | ||
|
||
=== Use Cases and Applications | ||
|
||
This Standard addresses the needs of Earth observation, environmental monitoring, and geospatial analysis applications that require efficient, scalable access to multidimensional datasets. It enables the harmonisation of existing data models with operational encoding formats suitable for cloud-native storage and analysis. | ||
|
||
Typical use cases include: | ||
* Storage and processing of raster and gridded data | ||
* Management of data cubes with temporal or vertical dimensions | ||
* Integration with catalogue systems through standardized metadata | ||
* Multi-resolution tiling for efficient visualization and analysis | ||
* Cloud-optimized access to large geospatial datasets |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,9 @@ | |
|
||
=== Terms and definitions | ||
|
||
GeoZarr specification inherits https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#concepts-and-terminology[concepts and terminology from the Zarr core specification]. | ||
The following terms adds Geozarr specificity to the existing Zarr terminology | ||
|
||
==== array | ||
|
||
A multidimensional, regularly spaced collection of values (e.g., raster data or gridded measurements), typically indexed by dimensions such as time, latitude, longitude, or spectral band. | ||
|
@@ -22,17 +25,17 @@ An array containing the primary geospatial or scientific measurements of interes | |
|
||
An index axis along which arrays are organised. Dimensions provide a naming and ordering scheme for accessing data in multidimensional arrays (e.g., `time`, `x`, `y`, `band`). | ||
|
||
==== group | ||
==== dataset | ||
|
||
A container for datasets, variables, dimensions, and metadata in Zarr. Groups may be nested to represent a logical hierarchy (e.g., for resolutions or collections). | ||
A group that contains one or more data variables along with their associated coordinate variables, having a consistent relationship between these components. A dataset represents a coherent set of related data arrays and follows the unified data model. | ||
|
||
|
||
==== metadata | ||
|
||
Structured information describing the content, context, and semantics of datasets, variables, and attributes. GeoZarr metadata includes CF attributes, geotransform definitions, and links to STAC metadata where applicable. | ||
|
||
==== multiscale dataset | ||
==== multiscale group | ||
|
||
A dataset that includes multiple representations of the same data variable at varying spatial resolutions. Each resolution level is associated with a tile matrix from an OGC Tile Matrix Set. | ||
A group that contains 2 or more child groups representing the same data at different resolutions, where each child group is a <<term-dataset,dataset>>. The multiscale group includes metadata describing the relationship between resolution levels. | ||
emmanuelmathot marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
==== tile matrix set | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this introduction.