Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions standard/template/sections/clause_4_terms_and_definitions.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

=== Terms and definitions

GeoZarr specification inherits https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#concepts-and-terminology[concepts and terminology from the Zarr core specification].
The following terms adds Geozarr specificity to the existing Zarr terminology

==== array

A multidimensional, regularly spaced collection of values (e.g., raster data or gridded measurements), typically indexed by dimensions such as time, latitude, longitude, or spectral band.
Expand All @@ -22,17 +25,17 @@ An array containing the primary geospatial or scientific measurements of interes

An index axis along which arrays are organised. Dimensions provide a naming and ordering scheme for accessing data in multidimensional arrays (e.g., `time`, `x`, `y`, `band`).

==== group
==== dataset

A container for datasets, variables, dimensions, and metadata in Zarr. Groups may be nested to represent a logical hierarchy (e.g., for resolutions or collections).
A group that contains one or more data variables along with their associated coordinate variables, having a consistent relationship between these components. A dataset represents a coherent set of related data arrays and follows the unified data model.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use Unified Data Model in capitals wherever it is formal reference to the clause 7 definition?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it say what the group is first? group is probably still a container for datasets that can be nested.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We inherit from Zarr for the group terminology. the section starts with:

GeoZarr specification inherits https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#concepts-and-terminology[concepts and terminology from the Zarr core specification].
The following terms adds Geozarr specificity to the existing Zarr terminology

I would like to avoid repeating the Zarr terminology in order to limit the maintenance if they evolve.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capitals solved


==== metadata

Structured information describing the content, context, and semantics of datasets, variables, and attributes. GeoZarr metadata includes CF attributes, geotransform definitions, and links to STAC metadata where applicable.

==== multiscale dataset
==== multiscale group

A dataset that includes multiple representations of the same data variable at varying spatial resolutions. Each resolution level is associated with a tile matrix from an OGC Tile Matrix Set.
A group that contains 2 or more child groups representing the same data at different resolutions, where each child group is a <<term-dataset,dataset>>. The multiscale group includes metadata describing the relationship between resolution levels.

==== tile matrix set

Expand Down
58 changes: 27 additions & 31 deletions standard/template/sections/clause_7_unified_data_model.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -87,11 +87,11 @@ To enable discovery of resources within the hierarchical structure of the data m

A STAC extension consists of embedding or referencing STAC Collection and Item metadata within the data model:

* Each dataset resource MAY reference a corresponding STAC `Collection` or `Item` using an identifier or embedded object.
* Each store resource MAY reference a corresponding STAC `Collection` or `Item` using an identifier or embedded object.
* STAC properties such as `datetime`, `bbox`, and `eo:bands` MAY be included in the metadata to enable spatial, temporal, and spectral filtering.
* The structure is compatible with external STAC APIs and metadata harvesting systems.

STAC integration is non-intrusive and modular. It does not impose changes on the internal organisation of datasets and MAY be adopted incrementally by implementations requiring catalogue-based discovery capabilities.
STAC integration is non-intrusive and modular. It does not impose changes on the internal organisation of the store and MAY be adopted incrementally by implementations requiring catalogue-based discovery capabilities.


==== Modularity and Interoperability
Expand All @@ -101,22 +101,22 @@ Each extension point is specified independently. Implementations may advertise s

=== Unified Model Structure

This clause defines the structural organisation of datasets conforming to the unified data model (UDM). It consolidates the foundational elements and optional extensions into a coherent architecture suitable for Zarr encoding, while remaining format-agnostic. The model establishes a modular and extensible framework that supports structured representation of multidimensional, geospatially-referenced resources.
This clause defines the structural organisation of stores conforming to the unified data model (UDM). It consolidates the foundational elements and optional extensions into a coherent architecture suitable for Zarr encoding, while remaining format-agnostic. The model establishes a modular and extensible framework that supports structured representation of multidimensional, geospatially-referenced resources.

The model represents datasets as abstract compositions of dimensions, coordinate variables, data variables, and associated metadata. This abstraction ensures that applications and services can reason about the content and semantics of a dataset without reliance on storage layout or specific serialisation.

==== Dataset Structure
==== Store Structure

A dataset conforming to the Unified Data Model (UDM) is structured as a hierarchy rooted at a top-level dataset entity. This design enables modularity and facilitates the representation of complex, multi-resolution, or thematically partitioned data collections.
A store conforming to the Unified Data Model (UDM) is structured as a hierarchy rooted at a top-level group. This design enables modularity and facilitates the representation of complex, multi-resolution, or thematically partitioned data collections.

Each dataset node comprises the following core components, aligned with the Unidata Common Data Model (CDM) and Climate and Forecast (CF) Conventions:
Each <<term-dataset, dataset>> comprises the following core components, aligned with the Unidata Common Data Model (CDM) and Climate and Forecast (CF) Conventions:

- **Dimensions** – Named, integer-valued axes defining the extent of data variables. Examples include `time`, `x`, `y`, and `band`.
- **Coordinate Variables** – Arrays that supply coordinate values along dimensions, providing spatial, temporal, or contextual referencing. These may be scalar or higher-dimensional, depending on the referencing scheme.
- **Data Variables** – Multidimensional arrays representing physical measurements or derived products. Defined over one or more dimensions, these variables are associated with coordinate variables and annotated with metadata.
- **Attributes** – Key-value pairs attached to variables or dataset components. Attributes convey semantic information such as units, standard names, and geospatial metadata.

The hierarchy is implemented through **groups**, which function as containers for variables, dimensions, and metadata. Groups may define local context while inheriting attributes from parent nodes. This supports the logical subdivision of datasets by theme, resolution, or processing stage, and enhances the clarity and reusability of complex geospatial structures.
A Zarr hierarchy is a tree structure, where each node in the tree is either a group or an array. Group nodes may have children but array nodes may not. This supports the logical subdivision by theme, resolution, or processing stage, and enhances the clarity and reusability of complex geospatial structures.

The diagram below represents the structural layer of the unified data model, derived from the Unidata Common Data Model, which serves as the foundational framework for supporting all overlaying model layer.

Expand All @@ -129,18 +129,17 @@ The diagram below represents the structural layer of the unified data model, der
....
@startuml CDM_DAL_Object_Model

class Dataset {
class Store {
+ String location
+ open()
+ close()
}

class Group {
+ String name
+ List<Group> subgroups
+ List<Variable> variables
+ List<Dimension> dimensions
+ List<Attribute> attributes
}

class Dataset {
}

class Dimension {
Expand All @@ -152,9 +151,6 @@ class Dimension {

class Variable {
+ String name
+ DataType dataType
+ List<Dimension> shape
+ List<Attribute> attributes
+ read()
}

Expand All @@ -169,19 +165,20 @@ class Attribute {
+ List<String> values
}

Dataset --> Group : rootGroup
Group --> Group : contains >
Group --> Variable : contains >
Group --> Dimension : defines >
Group --> Attribute : has >
Variable --> Dimension : uses >
Variable --> DataType : has >
Variable --> Attribute : has >
Store "1" --> "*" Group : rootGroup
Group "1" --> "*" Group : contains
Dataset -up-|> Group
Dataset --> "*" Variable : contains
Dataset --> "*" Dimension : defines
Group --> "*" Attribute : has
Variable --> "*" Dimension : uses
Variable --> "1" DataType : has
Variable --> "*" Attribute : has
@enduml
....
//endif::never-shown[]

Note that, conceptually, node within this hierarchy might be treated as a self-contained dataset.
Note that, conceptually, node within this hierarchy might be treated as a self-contained store.

==== Coordinate Referencing

Expand All @@ -196,7 +193,7 @@ The model accommodates both standard CF-compatible definitions and extended refe

Metadata may be declared at various levels within the model structure:

- **Global Metadata** – Attributes describing the dataset as a whole, including elements such as `title`, `summary`, and `license`.
- **Global Metadata** – Attributes describing the store as a whole, including elements such as `title`, `summary`, and `license`.
- **Variable Metadata** – Attributes associated with individual data or coordinate variables, conveying descriptive or semantic information.
- **Extension Metadata** – Structured metadata linked to optional model extensions (e.g., multiscale tiling, catalogue references, geotransform properties).

Expand All @@ -218,15 +215,15 @@ Overviews enable:

===== Conceptual Structure

An *Overviews* construct is defined as a *hierarchical set of multiscale representations* of one or more data variables. It comprises the following components:
A <<term-multiscale-group,multiscale group>> contains child groups representing the data at different resolutions, where each child group is a <<term-dataset, dataset>> following the unified data model. It comprises the following components:

[horizontal]
*Base Variable*:: The original, highest-resolution variable to which the overview hierarchy is anchored. It is defined using the standard `DataVariable` structure in the model.
*Overview Levels*:: A sequence of variables representing the same logical quantity as the base variable, but sampled at coarser spatial resolutions.
*Base Dataset*:: The original, highest-resolution dataset to which the multiscale hierarchy is anchored.
*Zoom Level Datasets*:: A sequence of datasets representing the same data as the base dataset, but sampled at coarser spatial resolutions.
*Zoom Level Identifier*:: A unique identifier associated with each level, ordered from finest (e.g. `"0"`) to coarsest resolution (e.g. `"N"`).
*Tile Grid Definition*:: A mapping that associates each zoom level with a spatial tiling layout, defined in alignment with a `TileMatrixSet`.
*Spatial Alignment*:: Each overview variable MUST be spatially aligned with the base variable using a consistent coordinate reference system and compatible axis orientation.
*Resampling Method*:: A declared method indicating the technique used to derive coarser levels from the base variable (e.g. `nearest`, `average`, `cubic`).
*Spatial Alignment*:: Each zoom-level dataset MUST be spatially aligned with the base dataset using a consistent coordinate reference system and compatible axis orientation.
*Resampling Method*:: A declared method indicating the technique used to derive coarser levels from the base dataset (e.g. `nearest`, `average`, `cubic`).

===== Model Components

Expand Down Expand Up @@ -351,4 +348,3 @@ The unified data model facilitates interoperability with tools and libraries acr
- *Cloud-native infrastructure*: support for parallel access, chunked storage, and hierarchical grouping compatible with object storage.

Tooling support is expected to grow via standard-conformant implementations, easing adoption across domains and infrastructures.

7 changes: 3 additions & 4 deletions standard/template/sections/clause_9_zarr_encoding_core.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

=== Hierarchical Structure

A dataset conforming to the unified data model is represented as a hierarchical structure of groups, variables (arrays), dimensions, and metadata. The dataset is rooted in a *top-level group*, which may contain:
A store conforming to the unified data model is structured as a hierarchy of groups, variables (arrays), dimensions, and metadata. Following Zarr conventions, this hierarchy is rooted in a group, which may contain:

- Arrays representing coordinate or data variables
- Child groups for modular organisation, including logical sub-collections or resolution levels
Expand All @@ -14,7 +14,7 @@ Each group adheres to a consistent structure, allowing recursive composition. Th
|===
|Model Element |Zarr v2 Encoding |Zarr v3 Encoding

|Root Dataset | Directory with `.zgroup` and `.zattrs` | Directory with `zarr.json`, with `node_type: group`
|Root Group | Directory with `.zgroup` and `.zattrs` | Directory with `zarr.json`, with `node_type: group`

|Child Group | Subdirectory with `.zgroup` and `.zattrs` | Subdirectory with `zarr.json`, with `node_type: group`

Expand Down Expand Up @@ -115,7 +115,7 @@ Example:

=== Global Metadata

Metadata associated with the dataset as a whole is stored at the root group level.
Metadata associated with the store is stored at the root group level.


[cols="1,2,2"]
Expand Down Expand Up @@ -157,4 +157,3 @@ In all cases:

- Attribute names are case-sensitive and encoded as UTF-8 strings
- Values shall conform to JSON-compatible types (string, number, boolean, array)

17 changes: 8 additions & 9 deletions standard/template/sections/clause_9_zarr_encoding_overviews.adoc
Original file line number Diff line number Diff line change
@@ -1,30 +1,30 @@

=== Encoding of Multiscale Overviews in Zarr

This clause specifies how multiscale tiling (also known as overviews or pyramids) is encoded in Zarr-based datasets conforming to the unified data model. The encoding supports both Zarr Version 2 and Version 3 and is aligned with the OGC Two Dimensional Tile Matrix Set Standard.
This clause specifies how multiscale tiling (also known as overviews or pyramids) is encoded in Zarr stores conforming to the unified data model. The encoding supports both Zarr Version 2 and Version 3 and is aligned with the OGC Two Dimensional Tile Matrix Set Standard.

Multiscale datasets are composed of a set of Zarr groups representing multiple zoom levels. Each level stores coarser-resolution resampled versions of the original data variables.
A multiscale group contains child groups, where each child group is a <<term-dataset,dataset>> representing a zoom level that stores a coarser-resolution resampled version of the original data variables.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For multiscale support, I think we need a consensus that gives flexibility to data producers:

Suggested change
The multiscale group may include or exclude the native data, and the child zoom-level groups may likewise include or exclude the native level (0). This flexibility allows producers to handle different scenarios, such as adding overviews later to an existing archive.

==== Hierarchical Layout

Each zoom level SHALL be represented as a Zarr group, identified by the Tile Matrix identifier (e.g., `"0"`, `"1"`, `"2"`). These groups SHALL be organised hierarchically under a common multiscale root group. Each zoom-level group SHALL contain the complete set of variables (Zarr arrays) corresponding to that resolution.
Each zoom level SHALL be represented as a child group, identified by the Tile Matrix identifier (e.g., `"0"`, `"1"`, `"2"`). These child groups SHALL be organized hierarchically under a common multiscale group and each SHALL be a <<term-dataset,dataset>> containing the complete set of variables (arrays) corresponding to that resolution. All zoom-level datasets MUST maintain consistent structure.

[cols="1,2,2"]
|===
|Structure |Zarr v2 |Zarr v3

|Zoom level groups | Subdirectories with `.zgroup` and `.zattrs` | Subdirectories with `zarr.json`, `node_type: group`
|Zoom level datasets | Subdirectories with `.zgroup` and `.zattrs` | Subdirectories with `zarr.json`, `node_type: group`

|Variables at each level | Zarr arrays (`.zarray`, `.zattrs`) in each group | Zarr arrays (`zarr.json`, `node_type: array`) in each group
|Variables at each level | Arrays (`.zarray`, `.zattrs`) in each dataset | Arrays (`zarr.json`, `node_type: array`) in each dataset

|Global metadata | `multiscales` defined in parent `.zattrs` | `multiscales` defined in parent group `zarr.json` under `attributes`
|Multiscale metadata | `multiscales` defined in multiscale group `.zattrs` | `multiscales` defined in multiscale group `zarr.json` under `attributes`
|===

Each multiscale group MUST define chunking (tiling) along the spatial dimensions (`X`, `Y`, or `lon`, `lat`). Recommended chunk sizes are 256×256 or 512×512.
Each zoom-level dataset MUST define chunking (tiling) along the spatial dimensions (`X`, `Y`, or `lon`, `lat`). Recommended chunk sizes are 256×256 or 512×512.

==== Metadata Encoding

Multiscale metadata SHALL be defined using a `multiscales` attribute located in the parent group of the zoom levels. This attribute SHALL be a JSON object with the following members:
Multiscale metadata SHALL be defined using a `multiscales` attribute located in the multiscale group. This attribute SHALL be a JSON object with the following members:

- `tile_matrix_set` – Identifier, URI, or inline JSON object compliant with OGC TileMatrixSet v2
- `resampling_method` – One of the standard string values (e.g., `"nearest"`, `"average"`)
Expand Down Expand Up @@ -98,4 +98,3 @@ The `resampling_method` MUST indicate the method used for downsampling across zo
`nearest`, `average`, `bilinear`, `cubic`, `cubic_spline`, `lanczos`, `mode`, `max`, `min`, `med`, `sum`, `q1`, `q3`, `rms`, `gauss`

The same method MUST apply across all levels.