Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ An index axis along which arrays are organised. Dimensions provide a naming and

==== dataset

A group that contains one or more data variables along with their associated coordinate variables, having a consistent relationship between these components. A dataset represents a coherent set of related data arrays and follows the unified data model.
A group that contains one or more data variables along with their associated coordinate variables, having a consistent relationship between these components. A dataset represents a coherent set of related data arrays and follows the Unified Data Model.

==== metadata

Expand All @@ -45,9 +45,9 @@ A spatial tiling scheme defined by a hierarchy of zoom levels and consistent gri

An affine transformation used to convert between grid coordinates and geospatial coordinates, typically defined using the GDAL GeoTransform convention.

==== unified data model (UDM)
==== Unified Data Model (UDM)

A conceptual model that defines how to structure geospatial data in Zarr using CDM-based constructs, including support for coordinate referencing, metadata integration, and multiscale representations.
A conceptual model that defines how to structure geospatial data in Zarr using CDM-based constructs, including support for coordinate referencing, metadata integration, and multiscale representations. The Unified Data Model provides a standardized framework for expressing spatial relationships, coordinate systems, and scientific metadata.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the current definition is not ideal, since an abstract model should not be defined for a specific format. Instead, it should stand independently and be applicable across formats, with Zarr being one possible encoding of that model (as for CDM, CF abstract model, UDM, etc.)

Suggested change
A conceptual model that defines how to structure geospatial data in Zarr using CDM-based constructs, including support for coordinate referencing, metadata integration, and multiscale representations. The Unified Data Model provides a standardized framework for expressing spatial relationships, coordinate systems, and scientific metadata.
A conceptual model for structuring geospatial data using CDM-based constructs. It enables consistent representation of coordinate referencing, metadata integration, and multiscale data. The Unified Data Model provides a standard framework for describing spatial relationships, coordinate systems, and scientific metadata, which can then be encoded in formats such as Zarr.


=== Abbreviated Terms

Expand Down
30 changes: 15 additions & 15 deletions standard/template/sections/clause_7_unified_data_model.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@

=== Scope and Purpose

This Standard defines a unified data model (UDM) that provides a conceptual framework for representing geospatial and scientific data in Zarr. The purpose of this model is to support standards-based interoperability across Earth observation systems and analytical environments, while preserving compatibility with existing data models and software ecosystems..
This Standard defines the Unified Data Model (UDM) that provides a conceptual framework for representing geospatial and scientific data in Zarr. The purpose of this model is to support standards-based interoperability across Earth observation systems and analytical environments, while preserving compatibility with existing data models and software ecosystems..

The unified data model incorporates and extends the following established specifications and community standards:
The Unified Data Model incorporates and extends the following established specifications and community standards:

- **Unidata Common Data Model (CDM)** – Provides the foundational resource structure for scientific datasets, encompassing dimensions, coordinate systems, variables, and associated metadata elements.
- **CF (Climate and Forecast) Conventions** – Defines a widely adopted metadata profile for describing spatiotemporal semantics in CDM-based datasets.
Expand All @@ -15,25 +15,25 @@ The unified data model incorporates and extends the following established specif
- **GDAL geotransform metadata**, used to express affine transformations and interpolation characteristics.
- **SpatioTemporal Asset Catalog (STAC)** metadata elements for resource discovery and cataloguing (Collection and Item constructs).

The unified model is format-agnostic and describes the abstract structure of resources independently of the physical encoding. It does not redefine the semantics of the CDM or CF conventions, but introduces integration and extension points required to support tiled multiscale data, geospatial referencing, and metadata for discovery.
The Unified Data Model is format-agnostic and describes the abstract structure of resources independently of the physical encoding. It does not redefine the semantics of the CDM or CF conventions, but introduces integration and extension points required to support tiled multiscale data, geospatial referencing, and metadata for discovery.

This clause specifies the logical composition of the unified model, the external standards it leverages, and the conformance points that facilitate harmonised implementation within the GeoZarr framework.
This clause specifies the logical composition of the Unified Data Model, the external standards it leverages, and the conformance points that facilitate harmonised implementation within the GeoZarr framework.

=== Foundational Model and Standards Reuse

GeoZarr adopts established data model concepts because Zarr itself provides only array storage without semantic interpretation. The Unidata Common Data Model (CDM) provides the conceptual framework for understanding dimensions, variables, and attributes, while CF Conventions provide standardized metadata semantics. This reuse ensures compatibility with existing scientific software while avoiding reinvention of proven concepts.

==== Common Data Model (CDM)

The CDM defines a generalised schema for representing array-based scientific datasets. The following constructs are reused directly within the unified model:
The CDM defines a generalised schema for representing array-based scientific datasets. The following constructs are reused directly within the Unified Data Model:

- **Dimensions** – Integer-valued, named axes that define the extents of data variables.
- **Coordinate Variables** – Variables that supply coordinate values along dimensions, establishing spatial or temporal context.
- **Data Variables** – Multidimensional arrays representing observed or simulated phenomena, associated with dimensions and coordinate variables.
- **Attributes** – Key-value metadata elements used to describe variables and datasets semantically.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the CDM uses a particular type system for attributes that is not a 1:1 match for Zarr's attributes type system.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a suggestion for describing that here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what i added so far

  • Clarified that CDM concepts are adapted for Zarr's JSON type system
  • Acknowledged differences while preserving semantic compatibility

- **Groups** – Optional hierarchical containers enabling logical organisation of resources and metadata.

The unified data model adopts these CDM components without modification excluding the user-defined types. Semantic interpretation remains consistent with the original CDM specification. GeoZarr structures are mapped to CDM constructs to ensure compatibility and clarity.
The Unified Data Model adopts these CDM components without modification excluding the user-defined types. Semantic interpretation remains consistent with the original CDM specification. GeoZarr structures are mapped to CDM constructs to ensure compatibility and clarity.

==== CF Conventions

Expand All @@ -44,7 +44,7 @@ The CF Conventions specify standardised metadata attributes and practices to des
- Physical units
- Standard variable naming

The unified data model supports CF-compliant metadata, including attributes such as `standard_name`, `units`, and `grid_mapping`. The unified data model does not prescribe CF compliance but enables it through permissive design. Partial adoption of CF attributes is supported, and non-compliant datasets may selectively adopt CF metadata as needed.
The Unified Data Model supports CF-compliant metadata, including attributes such as `standard_name`, `units`, and `grid_mapping`. The Unified Data Model does not prescribe CF compliance but enables it through permissive design. Partial adoption of CF attributes is supported, and non-compliant datasets may selectively adopt CF metadata as needed.

==== Standards-Based Extensions

Expand All @@ -58,7 +58,7 @@ These extensions are integrated in a modular fashion and do not alter the core s

=== Model Extension Points

The unified data model specifies a series of optional, standards-aligned extension points to support functionality beyond the base CDM and CF constructs. These extensions enhance applicability to Earth observation and spatial analysis use cases without imposing additional mandatory requirements.
The Unified Data Model specifies a series of optional, standards-aligned extension points to support functionality beyond the base CDM and CF constructs. These extensions enhance applicability to Earth observation and spatial analysis use cases without imposing additional mandatory requirements.

Each extension is defined as an independent module. Implementation of any given extension does not necessitate support for others.

Expand Down Expand Up @@ -99,9 +99,9 @@ STAC integration is non-intrusive and modular. It does not impose changes on the
Each extension point is specified independently. Implementations may advertise support for one or more extensions by declaring conformance to corresponding extension modules. This modularity facilitates incremental adoption, promotes reuse, and enhances interoperability across varied implementation environments.


=== Unified Model Structure
=== Unified Data Model Structure

This clause defines the structural organisation of stores conforming to the unified data model (UDM). It consolidates the foundational elements and optional extensions into a coherent architecture suitable for Zarr encoding, while remaining format-agnostic. The model establishes a modular and extensible framework that supports structured representation of multidimensional, geospatially-referenced resources.
This clause defines the structural organisation of stores conforming to the Unified Data Model (UDM). It consolidates the foundational elements and optional extensions into a coherent architecture suitable for Zarr encoding, while remaining format-agnostic. The model establishes a modular and extensible framework that supports structured representation of multidimensional, geospatially-referenced resources.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term "store" is already defined in the Zarr spec: see https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#id25. So the use of the same term with a different meaning in GeoZarr will likely become a point of confusion.

Copy link
Author

@emmanuelmathot emmanuelmathot Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually intent to use the store term as it is in Zarr so I will clarify and reference the Zarr terminology


The model represents datasets as abstract compositions of dimensions, coordinate variables, data variables, and associated metadata. This abstraction ensures that applications and services can reason about the content and semantics of a dataset without reliance on storage layout or specific serialisation.

Expand All @@ -118,7 +118,7 @@ Each <<term-dataset, dataset>> comprises the following core components, aligned

A Zarr hierarchy is a tree structure, where each node in the tree is either a group or an array. Group nodes may have children but array nodes may not. This supports the logical subdivision by theme, resolution, or processing stage, and enhances the clarity and reusability of complex geospatial structures.

The diagram below represents the structural layer of the unified data model, derived from the Unidata Common Data Model, which serves as the foundational framework for supporting all overlaying model layer.
The diagram below represents the structural layer of the Unified Data Model, derived from the Unidata Common Data Model, which serves as the foundational framework for supporting all overlaying model layer.

//image::udm-core.png[]

Expand Down Expand Up @@ -215,7 +215,7 @@ Overviews enable:

===== Conceptual Structure

A <<term-multiscale-group,multiscale group>> contains child groups representing the data at different resolutions, where each child group is a <<term-dataset, dataset>> following the unified data model. It comprises the following components:
A <<term-multiscale-group,multiscale group>> contains child groups representing the data at different resolutions, where each child group is a <<term-dataset, dataset>> following the Unified Data Model. It comprises the following components:

[horizontal]
*Base Dataset*:: The original, highest-resolution dataset to which the multiscale hierarchy is anchored.
Expand All @@ -227,7 +227,7 @@ A <<term-multiscale-group,multiscale group>> contains child groups representing

===== Model Components

The *Overviews* construct is represented in the unified data model using the following logical elements:
The *Overviews* construct is represented in the Unified Data Model using the following logical elements:

[cols="1,3"]
|===
Expand Down Expand Up @@ -307,7 +307,7 @@ This extensibility framework supports both minimum-viable use and high-fidelity

=== Interoperability Considerations

Interoperability is a core objective of the GeoZarr unified data model. The model is designed to bridge diverse Earth observation and scientific data ecosystems by enabling structural and semantic compatibility with established formats and standards, while providing a forward-looking foundation for scalable, cloud-native workflows.
Interoperability is a core objective of the GeoZarr Unified Data Model. The model is designed to bridge diverse Earth observation and scientific data ecosystems by enabling structural and semantic compatibility with established formats and standards, while providing a forward-looking foundation for scalable, cloud-native workflows.

This section outlines the principles and mechanisms supporting interoperability across formats, tools, and communities.

Expand Down Expand Up @@ -341,7 +341,7 @@ This approach enables seamless integration into modern data catalogues and platf

==== Tool and Ecosystem Support

The unified data model facilitates interoperability with tools and libraries across the following domains:
The Unified Data Model facilitates interoperability with tools and libraries across the following domains:

- *Scientific computing*: NetCDF-based libraries (e.g., xarray, netCDF4), Zarr-compatible clients.
- *Geospatial processing*: GDAL, rasterio, QGIS (via Zarr driver extensions or translations).
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

=== Encoding of Multiscale Overviews in Zarr

This clause specifies how multiscale tiling (also known as overviews or pyramids) is encoded in Zarr stores conforming to the unified data model. The encoding supports both Zarr Version 2 and Version 3 and is aligned with the OGC Two Dimensional Tile Matrix Set Standard.
This clause specifies how multiscale tiling (also known as overviews or pyramids) is encoded in Zarr stores conforming to the Unified Data Model. The encoding supports both Zarr Version 2 and Version 3 and is aligned with the OGC Two Dimensional Tile Matrix Set Standard.

A <<term-multiscale-group,multiscale group>> contains one or more child groups, where each child group is a <<term-dataset,dataset>> representing a zoom level of the data. Additional resolution levels can be added over time, with each new level storing a coarser-resolution resampled version of the original data variables.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For multiscale support, I think we need a consensus that gives flexibility to data producers:

Suggested change
The multiscale group may include or exclude the native data, and the child zoom-level groups may likewise include or exclude the native level (0). This flexibility allows producers to handle different scenarios, such as adding overviews later to an existing archive.

Expand Down