Skip to content
Open
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
60fddbc
add: create specifications for Projection Attribute Extension and its…
emmanuelmathot Sep 8, 2025
f09f67a
fix: update attribute naming from `proj` to `geo` in Projection Attri…
emmanuelmathot Sep 9, 2025
83a5f93
fix: remove `shape` field from Projection Attribute Extension schema …
emmanuelmathot Sep 9, 2025
77bae32
fix: rename `spatial_dims` to `spatial_dimensions` in README and sche…
emmanuelmathot Sep 9, 2025
caee06b
fix: enhance spatial dimension identification section in README with …
emmanuelmathot Sep 9, 2025
9daf0d3
fix: clarify inheritance model for `geo:proj` attribute in README
emmanuelmathot Sep 9, 2025
7e77fa1
fix: clarify spatial dimension identification section in README with …
emmanuelmathot Sep 9, 2025
01537ec
fix: add version field to geo:proj schema and update README for compa…
emmanuelmathot Sep 11, 2025
17a0700
fix: update geo:proj schema to include version field and restructure …
emmanuelmathot Sep 14, 2025
289f774
Update attributes/geo:proj/README.md
emmanuelmathot Sep 14, 2025
cd14f53
Update attributes/README.md
emmanuelmathot Sep 14, 2025
0932243
Update attributes/README.md
emmanuelmathot Sep 14, 2025
cd58d28
Update attributes/geo:proj/README.md
emmanuelmathot Sep 14, 2025
c317a4a
Update attributes/geo:proj/README.md
emmanuelmathot Sep 14, 2025
8a60696
Remove redundant examples for irregular grid and update WKT2 represen…
emmanuelmathot Sep 14, 2025
96b5db1
Refine spatial dimension identification section to focus on regular g…
emmanuelmathot Sep 14, 2025
e9e754e
Refactor inheritance model section for clarity and remove redundant r…
emmanuelmathot Sep 14, 2025
1893b7b
Update examples in geo:proj README.md to enhance clarity and provide …
emmanuelmathot Sep 14, 2025
a244d90
Clarify spatial dimension interpretation in geo:proj README.md to emp…
emmanuelmathot Sep 14, 2025
1cffe00
Clarify terminology and improve consistency in geo:proj README.md reg…
emmanuelmathot Sep 14, 2025
d903aa7
Remove the Registered Extensions section from attributes README.md to…
emmanuelmathot Sep 14, 2025
f4364e4
Update attributes/geo:proj/README.md
emmanuelmathot Sep 14, 2025
cc9f913
Enhance README.md and schema.json for geo:proj extension by adding de…
emmanuelmathot Sep 14, 2025
21b0d1c
Refine validation rules in geo:proj README.md to clarify shape infere…
emmanuelmathot Sep 14, 2025
1bde1bc
Update attributes/README.md
emmanuelmathot Sep 17, 2025
190ad60
Update versioning in geo:proj README.md and schema.json to 0.1.0
emmanuelmathot Sep 17, 2025
a4b2bef
Update attributes/geo:proj/README.md
emmanuelmathot Sep 17, 2025
544687d
Update attributes/geo:proj/README.md
emmanuelmathot Sep 17, 2025
4582d2a
Enhance README.md with projection authority examples and clarify CRS …
emmanuelmathot Sep 17, 2025
1c0134f
Remove redundant emphasis on spatial dimensions interpretation in geo…
emmanuelmathot Sep 17, 2025
ac841d8
Add algorithm for resolving spatial dimensions in group-level geo:proj
emmanuelmathot Sep 17, 2025
803f269
Clarify flexibility in spatial dimension ordering in geo:proj README.md
emmanuelmathot Sep 17, 2025
27a1123
Refactor Geo Projection Attribute Extension
emmanuelmathot Sep 21, 2025
eeb64ae
Update attributes/geo/proj/README.md
emmanuelmathot Sep 22, 2025
22b277d
Merge branch 'main' into proj-crs
emmanuelmathot Sep 28, 2025
c84a05e
Update README.md for Geo Projection Attribute Extension and remove sc…
emmanuelmathot Sep 28, 2025
5132326
Remove link to Attributes section in README.md
emmanuelmathot Sep 28, 2025
b13da42
Update attributes/geo/proj/README.md
emmanuelmathot Sep 29, 2025
f9ffc2f
Update attributes/geo/proj/README.md
emmanuelmathot Sep 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ It is the normative source for registering names of Zarr v3 extensions.

To register an extension, open a new PR with a new extension directory under the relevant extension point:

* [Attributes](./attributes/README.md)
* [Codecs](./codecs/README.md)
* [Data Types](./data-types/README.md)
* [Chunk Key Encoding](./chunk-key-encodings/README.md)
Expand Down
25 changes: 25 additions & 0 deletions attributes/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Attributes Extensions

This directory contains specifications for Zarr v3 attribute extensions.

## What are Attribute Extensions?

Attribute extensions define standardized schemas and semantics for metadata stored in the attributes of Zarr arrays and groups. These extensions enable interoperability by establishing common conventions for domain-specific metadata.


## Creating an Attribute Extension

When creating an attribute extension, consider:

1. **Namespace**: Use a unique prefix to avoid conflicts (e.g., `proj:` for projection)
2. **Schema**: Provide a JSON schema for validation
3. **Inheritance**: Define behavior when attributes are set at group vs array level
4. **Compatibility**: Consider interoperability with existing tools and standards
5. **Example data**: Where possible, consider including a complete Zarr hierarchy that implements the extension.
## Extension Requirements

Each attribute extension MUST:
- Define the attribute key(s) and structure
- Provide a JSON schema for validation
- Include examples of usage
- Document any inheritance or precedence rules
236 changes: 236 additions & 0 deletions attributes/geo:proj/README.md

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If more than 1 of the 3 is provided, they have to be semantically identical (i.e. describe the same CRS).

Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
# Projection Attribute Extension for Zarr

- **Extension Name**: Projection Attribute Extension
- **Version**: 1.0.0
- **Extension Type**: Attribute
- **Status**: Proposed
- **Owners**: @emmanuelmathot

## Description

This specification defines a JSON object that encodes coordinate reference system (CRS) information for geospatial data. Additionally, this specification defines a convention in which this object is stored under the `"geo:proj"` key in the attributes of Zarr groups or arrays.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I prefer GeoCRS or GeogCRS over any proj-related names for the key since "crs" is more generic than "proj"


**Recommended usage**: Define `geo:proj` at the **group level** to apply CRS information to all arrays within that group. This matches the common geospatial pattern of storing multiple arrays with the same coordinates in a single group. Array-level definitions are supported for override cases but are less common.

## Motivation

- Provides simple, standardized CRS encoding without complex nested structures
- Compatible with existing geospatial tools (GDAL, rasterio, pyproj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Compatible with existing geospatial tools (GDAL, rasterio, pyproj)
- Future cross-compatibility with existing geospatial tools (GDAL, rasterio, pyproj)

I think this better reflects the current status and goals

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know the data model is compatible. We can a section about the tooling implementation status but I'd avoid putting assumptions in the motivation section.

- Based on the proven STAC Projection Extension model

## Inheritance Model

The `geo:proj` attribute follows a simple group-to-array inheritance model that should be understood first:

### Inheritance Rules

1. **Group-level definition** (recommended): When `geo:proj` is defined at the group level, it applies to all arrays within that group
2. **Array-level override**: An array can completely override the group's `geo:proj` attribute with its own definition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use-case for array-level overrides?

3. **Complete replacement only**: Partial inheritance (overriding only some fields while inheriting others) is not allowed
4. **No cascading**: Inheritance only applies from a group directly to its immediate array members, not through nested groups

Most use cases will use group-level definitions without array overrides.

## Specification

The `geo:proj` attribute can be added to Zarr arrays or groups to define projection information.

<!-- GENERATED_SCHEMA_DOCS_START -->
**`geo:proj` Properties**

| |Type|Description|Required|
|---|---|---|---|
|**version**|`string`|Projection metadata version| &#10003; Yes|
|**code**|`["string", "null"]`|Authority:code identifier (e.g., EPSG:4326)|No|
|**wkt2**|`["string", "null"]`|WKT2 (ISO 19162) CRS representation|No|
|**projjson**|`any`|PROJJSON CRS representation|No|
|**bbox**|`number` `[]`|Bounding box in CRS coordinates|No|
|**transform**|`number` `[]`|Affine transformation coefficients|No|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|**transform**|`number` `[]`|Affine transformation coefficients|No|
|**affine**|`number` `[]`|Affine transformation coefficients|No|

Could we use affine instead of transform since that's more specific and leaves an option for other transforms to be added?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively something like

"transform": {"name": "affine", "configuration": {...}}

so it is clear that only one transform is allowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transform is the correct general mathematical term here. According to standard mathematical definitions, 'geometric transformation' is the broader concept, while 'affine transformation' is a specific subset. Using 'transform' maintains consistency with established geospatial standards (GDAL's GetGeoTransform, rasterio's Transform) and leaves room for potential future extensions to support other transformation types.
This follows the principle of "optimize for the common case" - use terminology that works for both the 80% majority and the 20% edge cases, rather than terminology that's precise for 80% but excludes future possibilities.

|**spatial_dimensions**|`string` `[2]`|Names of spatial dimensions [y_name, x_name]|No|

### Field Details

Additional properties are allowed.

#### geo:proj.version

Projection metadata version

* **Type**: `string`
* **Required**: &#10003; Yes
* **Allowed values**:
* `"1.0"`

#### geo:proj.code

Authority:code identifier (e.g., EPSG:4326)

* **Type**: `["string", "null"]`
* **Required**: No
* **Pattern**: `^[A-Z]+:[0-9]+$`

#### geo:proj.wkt2

WKT2 (ISO 19162) CRS representation

* **Type**: `["string", "null"]`
* **Required**: No

#### geo:proj.projjson

PROJJSON CRS representation

* **Type**: `any`
* **Required**: No

#### geo:proj.bbox

Bounding box in CRS coordinates

* **Type**: `number` `[]`
* **Required**: No

#### geo:proj.transform

Affine transformation coefficients

* **Type**: `number` `[]`
* **Required**: No

#### geo:proj.spatial_dimensions

Names of spatial dimensions [y_name, x_name]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does order matter here? If so should it be enforced that all arrays within the group have this order if they have both y and x as dimensions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can have spatial dimaneisons with unconventiona names (e.g. azimuth_time, ground_range), It is better then to have the names with the Y,X order specified here.


* **Type**: `string` `[2]`
* **Required**: No
<!-- GENERATED_SCHEMA_DOCS_END -->

Note: The shape of spatial dimensions is obtained directly from the Zarr array metadata once the spatial dimensions are identified.

### Spatial Dimension Identification

In this extension, "spatial dimensions" refers to the dimension names of 2D/3D arrays within this group to which the projection definition applies. This extension is designed for regular grids where dimensions directly correspond to spatial axes.

The extension identifies these array dimensions through:

1. **Explicit Declaration** (recommended): Use `spatial_dimensions` to specify dimension names
2. **Pattern-Based Detection** (fallback): Automatically detect spatial dimensions using patterns defined by this extension

#### Explicit Declaration

```json
{
"geo:proj": {
"spatial_dimensions": ["latitude", "longitude"]
}
}
```

#### Pattern-Based Detection

If `spatial_dimensions` is not provided, implementations should scan `dimension_names` for these patterns defined by this extension (in order):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dimension_names are specified at the array level whereas this proposal is primarily targeted at the group-level so you would need to scan all the arrays within a group. If you were to look through all the arrays in a group you would encounter some that are the coordinate arrays themselves. Does there maybe need to be a clause that when scanning if you encounter an array where its name matches dimension_names you should ignore it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coordinate arrays are excluded from this specification. the geo:proj attribute and its spatial dimension detection only applies to data arrays and their shapes (basic Zarr concepts). So when scanning dimension_names for spatial patterns, implementations would only examine data arrays and their shape within the group

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are coordinate arrays and data arrays differentiated in Zarr? My understanding was that all arrays are the same and it is the matching of dimension_names on one array with the name of another array that creates the coordinates in xarray.


- ["y", "x"] or ["Y", "X"]
- ["lat", "lon"] or ["latitude", "longitude"]
- ["northing", "easting"]
- ["row", "col"] or ["line", "sample"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be broadened a bit to allow any dimension_names that includes these patterns. That would catch cases where there is also time or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The geo:proj extension specifically scopes to only the spatial dimensions that the CRS applies to (typically 2D: y/x, lat/lon, etc.). Non-spatial dimensions like time, band, or depth are outside the scope of this extension. The pattern matching is designed to identify exactly the spatial dimension pair that corresponds to the CRS, not to handle additional dimensions in the array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I just got your comments afterwards. This is not a static pattern that dimension_names must exactly match. The pattern matching rule is a set of possible name that must be found together in dimension_names to match a possible combination


The first matching pair determines the spatial dimensions. **Important**: When dimensions like "X" and "Y" are found, they are always interpreted as [Y, X] (following lat/lon convention), regardless of their actual order in the Zarr array's `dimension_names`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: When dimensions like "X" and "Y" are found, they are always interpreted as [Y, X] (following lat/lon convention), regardless of their actual order in the Zarr array's dimension_names.

This doesn't make sense to me. Can you please clarify the intention of this statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the confusing statement and Just specify the patterns in the correct order


### Validation Rules

- Once spatial dimensions are identified (either explicitly through `spatial_dimensions` or through pattern-based detection), their sizes are obtained from the Zarr array's shape metadata
- The spatial dimension order is always [y/lat/northing, x/lon/easting]
- If spatial dimensions cannot be identified through either method, implementations MUST raise an error
- When multiple CRS representations are provided, precedence is: `projjson` > `wkt2` > `code`

### Shape Reconciliation

The shape of spatial dimensions is determined by:
1. Identifying the spatial dimensions using either `spatial_dimensions` or pattern-based detection
2. Looking up these dimension names in the Zarr array's `dimension_names`
3. Using the corresponding sizes from the array's `shape` attribute

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When talking about "the array" I think it's necessary to separate out the "dimension array" from the "data variable array". My understanding is here the proposal is to compare the shape of all data variable arrays with the shape of any listed dimension arrays

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes indeed but the "non matching" dimension arrays would be skipped if not matching the pattern of the spatial_dimensions defintion


This approach avoids redundancy and ensures consistency by using the array's own metadata rather than duplicating shape information.

## Examples

### Example 1: Simple Web Mercator Raster (Group Level)

```json
{
"zarr_format": 3,
"node_type": "group",
"attributes": {
"geo:proj": {
"code": "EPSG:3857",
"transform": [156543.03392804097, 0.0, -20037508.342789244, 0.0, -156543.03392804097, 20037508.342789244],
"bbox": [-20037508.342789244, -20037508.342789244, 20037508.342789244, 20037508.342789244]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One point of confusion for me in STAC is whether the bbox refers to the outer bounds of the cells or the coordinate values for raster data? From this, it looks like it's the coordinate values which will always be narrower than the outer bounds of the cells for raster data. Should we specify that in this document?

}
}
}
```

### Example 2: Multi-band Satellite Image

```json
{
"zarr_format": 3,
"shape": [4, 2048, 2048],
"dimension_names": ["band", "y", "x"],
"attributes": {
"geo:proj": {
"code": "EPSG:32633",
"spatial_dimensions": ["y", "x"],
"transform": [30.0, 0.0, 500000.0, 0.0, -30.0, 5000000.0],
"bbox": [500000.0, 4900000.0, 561440.0, 4961440.0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should transform and bbox be given a type as part of this document (e.g., double precision float)?

}
}
}
```

### Example 3: Geographic Coordinates with Transform

```json
{
"zarr_format": 3,
"shape": [1800, 3600],
"dimension_names": ["lat", "lon"],
"attributes": {
"geo:proj": {
"code": "EPSG:4326",
"transform": [0.1, 0.0, -180.0, 0.0, -0.1, 90.0],
"bbox": [-180.0, -90.0, 180.0, 90.0]
}
}
}
```

### Example 4: WKT2 Representation

```json
{
"zarr_format": 3,
"shape": [1000, 1000],
"dimension_names": ["northing", "easting"],
"attributes": {
"geo:proj": {
"wkt2": "PROJCRS[\"WGS 84 / UTM zone 33N\",BASEGEOGCRS[\"WGS 84\",DATUM[\"World Geodetic System 1984\",ELLIPSOID[\"WGS 84\",6378137,298.257223563,LENGTHUNIT[\"metre\",1]]],PRIMEM[\"Greenwich\",0,ANGLEUNIT[\"degree\",0.0174532925199433]]],CONVERSION[\"UTM zone 33N\",METHOD[\"Transverse Mercator\",ID[\"EPSG\",9807]],PARAMETER[\"Latitude of natural origin\",0,ANGLEUNIT[\"degree\",0.0174532925199433]],PARAMETER[\"Longitude of natural origin\",15,ANGLEUNIT[\"degree\",0.0174532925199433]],PARAMETER[\"Scale factor at natural origin\",0.9996,SCALEUNIT[\"unity\",1]],PARAMETER[\"False easting\",500000,LENGTHUNIT[\"metre\",1]],PARAMETER[\"False northing\",0,LENGTHUNIT[\"metre\",1]]],CS[Cartesian,2],AXIS[\"easting\",east,ORDER[1],LENGTHUNIT[\"metre\",1]],AXIS[\"northing\",north,ORDER[2],LENGTHUNIT[\"metre\",1]]]",
"transform": [30.0, 0.0, 500000.0, 0.0, -30.0, 5000000.0]
}
}
}
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these examples are for arrays even though it is recommended that this be defined at the group level. Are we sure we want to allow inheritance from the group level?

I very much like @benbovy's suggestion (#21 (comment)) that the "geo:proj" blobs be defined at that group level with an id and then the arrays reference a specific id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still a bit confused with the "dimension array" term you often use. Are they coordinates? In that case, this is out of scope. We want to keep the spec on top of the base Zarr concepts (arrays and shapes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsignell I think I finally understood your point here and I updated the readme to better describe how the spec should interpret at array-level.


## Compatibility Notes

- The `version` field allows tracking of changes and ensures compatibility with future updates
- The `code` field follows the "authority:code" format used by PROJ library
- The `wkt2` field should conform to OGC WKT2 (ISO 19162) standard
- The `transform` field follows the same ordering as GDAL's GeoTransform and STAC's projection extension

## References

- [STAC Projection Extension v2.0.0](https://github.com/stac-extensions/projection)
- [PROJJSON Specification](https://proj.org/specifications/projjson.html)
- [OGC WKT2 Standard](https://www.ogc.org/standards/wkt-crs)
Loading