-
Notifications
You must be signed in to change notification settings - Fork 10
Proposal: Projection Attribute Extension for Zarr v3 #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 23 commits
60fddbc
f09f67a
83a5f93
77bae32
caee06b
9daf0d3
7e77fa1
01537ec
17a0700
289f774
cd14f53
0932243
cd58d28
c317a4a
8a60696
96b5db1
e9e754e
1893b7b
a244d90
1cffe00
d903aa7
f4364e4
cc9f913
21b0d1c
1bde1bc
190ad60
a4b2bef
544687d
4582d2a
1c0134f
ac841d8
803f269
27a1123
eeb64ae
22b277d
c84a05e
5132326
b13da42
f9ffc2f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# Attributes Extensions | ||
|
||
This directory contains specifications for Zarr v3 attribute extensions. | ||
|
||
## What are Attribute Extensions? | ||
|
||
Attribute extensions define standardized schemas and semantics for metadata stored in the attributes of Zarr arrays and groups. These extensions enable interoperability by establishing common conventions for domain-specific metadata. | ||
|
||
|
||
## Creating an Attribute Extension | ||
|
||
When creating an attribute extension, consider: | ||
|
||
1. **Namespace**: Use a unique prefix to avoid conflicts (e.g., `proj:` for projection) | ||
2. **Schema**: Provide a JSON schema for validation | ||
3. **Inheritance**: Define behavior when attributes are set at group vs array level | ||
4. **Compatibility**: Consider interoperability with existing tools and standards | ||
5. **Example data**: Where possible, consider including a complete Zarr hierarchy that implements the extension. | ||
## Extension Requirements | ||
|
||
Each attribute extension MUST: | ||
- Define the attribute key(s) and structure | ||
- Provide a JSON schema for validation | ||
- Include examples of usage | ||
- Document any inheritance or precedence rules | ||
emmanuelmathot marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,236 @@ | ||||||
# Projection Attribute Extension for Zarr | ||||||
|
||||||
- **Extension Name**: Projection Attribute Extension | ||||||
- **Version**: 1.0.0 | ||||||
emmanuelmathot marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
- **Extension Type**: Attribute | ||||||
- **Status**: Proposed | ||||||
- **Owners**: @emmanuelmathot | ||||||
|
||||||
## Description | ||||||
|
||||||
This specification defines a JSON object that encodes coordinate reference system (CRS) information for geospatial data. Additionally, this specification defines a convention in which this object is stored under the `"geo:proj"` key in the attributes of Zarr groups or arrays. | ||||||
emmanuelmathot marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
||||||
**Recommended usage**: Define `geo:proj` at the **group level** to apply CRS information to all arrays within that group. This matches the common geospatial pattern of storing multiple arrays with the same coordinates in a single group. Array-level definitions are supported for override cases but are less common. | ||||||
|
||||||
## Motivation | ||||||
|
||||||
- Provides simple, standardized CRS encoding without complex nested structures | ||||||
- Compatible with existing geospatial tools (GDAL, rasterio, pyproj) | ||||||
|
- Compatible with existing geospatial tools (GDAL, rasterio, pyproj) | |
- Future cross-compatibility with existing geospatial tools (GDAL, rasterio, pyproj) |
I think this better reflects the current status and goals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know the data model is compatible. We can a section about the tooling implementation status but I'd avoid putting assumptions in the motivation section.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use-case for array-level overrides?
emmanuelmathot marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
emmanuelmathot marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|**transform**|`number` `[]`|Affine transformation coefficients|No| | |
|**affine**|`number` `[]`|Affine transformation coefficients|No| |
Could we use affine instead of transform since that's more specific and leaves an option for other transforms to be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively something like
"transform": {"name": "affine", "configuration": {...}}
so it is clear that only one transform is allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transform is the correct general mathematical term here. According to standard mathematical definitions, 'geometric transformation' is the broader concept, while 'affine transformation' is a specific subset. Using 'transform' maintains consistency with established geospatial standards (GDAL's GetGeoTransform, rasterio's Transform) and leaves room for potential future extensions to support other transformation types.
This follows the principle of "optimize for the common case" - use terminology that works for both the 80% majority and the 20% edge cases, rather than terminology that's precise for 80% but excludes future possibilities.
emmanuelmathot marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
emmanuelmathot marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does order matter here? If so should it be enforced that all arrays within the group have this order if they have both y and x as dimensions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can have spatial dimaneisons with unconventiona names (e.g. azimuth_time, ground_range), It is better then to have the names with the Y,X order specified here.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dimension_names
are specified at the array level whereas this proposal is primarily targeted at the group-level so you would need to scan all the arrays within a group. If you were to look through all the arrays in a group you would encounter some that are the coordinate arrays themselves. Does there maybe need to be a clause that when scanning if you encounter an array where its name matches dimension_names
you should ignore it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coordinate arrays are excluded from this specification. the geo:proj
attribute and its spatial dimension detection only applies to data arrays and their shapes (basic Zarr concepts). So when scanning dimension_names
for spatial patterns, implementations would only examine data arrays and their shape within the group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are coordinate arrays and data arrays differentiated in Zarr? My understanding was that all arrays are the same and it is the matching of dimension_names
on one array with the name of another array that creates the coordinates in xarray.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be broadened a bit to allow any dimension_names
that includes these patterns. That would catch cases where there is also time
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The geo:proj extension specifically scopes to only the spatial dimensions that the CRS applies to (typically 2D: y/x, lat/lon, etc.). Non-spatial dimensions like time, band, or depth are outside the scope of this extension. The pattern matching is designed to identify exactly the spatial dimension pair that corresponds to the CRS, not to handle additional dimensions in the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I just got your comments afterwards. This is not a static pattern that dimension_names must exactly match. The pattern matching rule is a set of possible name that must be found together in dimension_names
to match a possible combination
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important: When dimensions like "X" and "Y" are found, they are always interpreted as [Y, X] (following lat/lon convention), regardless of their actual order in the Zarr array's
dimension_names
.
This doesn't make sense to me. Can you please clarify the intention of this statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove the confusing statement and Just specify the patterns in the correct order
emmanuelmathot marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When talking about "the array" I think it's necessary to separate out the "dimension array" from the "data variable array". My understanding is here the proposal is to compare the shape of all data variable arrays with the shape of any listed dimension arrays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes indeed but the "non matching" dimension arrays would be skipped if not matching the pattern of the spatial_dimensions defintion
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One point of confusion for me in STAC is whether the bbox refers to the outer bounds of the cells or the coordinate values for raster data? From this, it looks like it's the coordinate values which will always be narrower than the outer bounds of the cells for raster data. Should we specify that in this document?
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should transform and bbox be given a type as part of this document (e.g., double precision float)?
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these examples are for arrays even though it is recommended that this be defined at the group level. Are we sure we want to allow inheritance from the group level?
I very much like @benbovy's suggestion (#21 (comment)) that the "geo:proj" blobs be defined at that group level with an id and then the arrays reference a specific id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still a bit confused with the "dimension array" term you often use. Are they coordinates? In that case, this is out of scope. We want to keep the spec on top of the base Zarr concepts (arrays and shapes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsignell I think I finally understood your point here and I updated the readme to better describe how the spec should interpret at array-level.
Uh oh!
There was an error while loading. Please reload this page.