-
Notifications
You must be signed in to change notification settings - Fork 10
Proposal: Projection Attribute Extension for Zarr v3 #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 31 commits
60fddbc
f09f67a
83a5f93
77bae32
caee06b
9daf0d3
7e77fa1
01537ec
17a0700
289f774
cd14f53
0932243
cd58d28
c317a4a
8a60696
96b5db1
e9e754e
1893b7b
a244d90
1cffe00
d903aa7
f4364e4
cc9f913
21b0d1c
1bde1bc
190ad60
a4b2bef
544687d
4582d2a
1c0134f
ac841d8
803f269
27a1123
eeb64ae
22b277d
c84a05e
5132326
b13da42
f9ffc2f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# Attributes Extensions | ||
|
||
This directory contains specifications for Zarr v3 attribute extensions. | ||
|
||
## What are Attribute Extensions? | ||
|
||
Attribute extensions define standardized schemas and semantics for metadata stored in the attributes of Zarr arrays and groups. These extensions enable interoperability by establishing common conventions for domain-specific metadata. | ||
|
||
|
||
## Creating an Attribute Extension | ||
|
||
When creating an attribute extension, consider: | ||
|
||
1. **Namespace**: Use a unique prefix to avoid conflicts (e.g., `proj` for projection). Choose namespace characters that are compatible with all operating systems by avoiding special characters like colons (:) | ||
2. **Schema**: Provide a JSON schema for validation | ||
3. **Inheritance**: Define behavior when attributes are set at group vs array level | ||
4. **Compatibility**: Consider interoperability with existing tools and standards | ||
5. **Example data**: Where possible, consider including a complete Zarr hierarchy that implements the extension. | ||
## Extension Requirements | ||
|
||
Each attribute extension MUST: | ||
- Define the attribute key(s) and structure | ||
- Provide a JSON schema for validation | ||
- Include examples of usage | ||
- Document any inheritance or precedence rules | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,347 @@ | ||||||
# Projection Attribute Extension for Zarr | ||||||
|
||||||
- **Extension Name**: Projection Attribute Extension | ||||||
- **Version**: 0.1.0 | ||||||
- **Extension Type**: Attribute | ||||||
- **Status**: Proposed | ||||||
- **Owners**: @emmanuelmathot | ||||||
|
||||||
## Description | ||||||
|
||||||
This specification defines a JSON object that encodes datum and coordinate reference system (CRS) information for geospatial data. Additionally, this specification defines a convention for storing this object under the `"geo:proj"` key in the attributes of Zarr groups or arrays. | ||||||
|
||||||
**Recommended usage**: Define `geo:proj` at the **group level** to apply CRS information to all arrays within that group. This matches the common geospatial pattern of storing multiple arrays with the same coordinates in a single group. Array-level definitions are supported for override cases but are less common. | ||||||
|
||||||
## Motivation | ||||||
|
||||||
- Provides simple, standardized CRS encoding without complex nested structures | ||||||
- Compatible with existing geospatial tools (GDAL, rasterio, pyproj) | ||||||
|
- Compatible with existing geospatial tools (GDAL, rasterio, pyproj) | |
- Future cross-compatibility with existing geospatial tools (GDAL, rasterio, pyproj) |
I think this better reflects the current status and goals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know the data model is compatible. We can a section about the tooling implementation status but I'd avoid putting assumptions in the motivation section.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use-case for array-level overrides?
emmanuelmathot marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|**transform**|`number` `[]`|Affine transformation coefficients|No| | |
|**affine**|`number` `[]`|Affine transformation coefficients|No| |
Could we use affine instead of transform since that's more specific and leaves an option for other transforms to be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively something like
"transform": {"name": "affine", "configuration": {...}}
so it is clear that only one transform is allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transform is the correct general mathematical term here. According to standard mathematical definitions, 'geometric transformation' is the broader concept, while 'affine transformation' is a specific subset. Using 'transform' maintains consistency with established geospatial standards (GDAL's GetGeoTransform, rasterio's Transform) and leaves room for potential future extensions to support other transformation types.
This follows the principle of "optimize for the common case" - use terminology that works for both the 80% majority and the 20% edge cases, rather than terminology that's precise for 80% but excludes future possibilities.
emmanuelmathot marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
emmanuelmathot marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does order matter here? If so should it be enforced that all arrays within the group have this order if they have both y and x as dimensions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can have spatial dimaneisons with unconventiona names (e.g. azimuth_time, ground_range), It is better then to have the names with the Y,X order specified here.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dimension_names
are specified at the array level whereas this proposal is primarily targeted at the group-level so you would need to scan all the arrays within a group. If you were to look through all the arrays in a group you would encounter some that are the coordinate arrays themselves. Does there maybe need to be a clause that when scanning if you encounter an array where its name matches dimension_names
you should ignore it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coordinate arrays are excluded from this specification. the geo:proj
attribute and its spatial dimension detection only applies to data arrays and their shapes (basic Zarr concepts). So when scanning dimension_names
for spatial patterns, implementations would only examine data arrays and their shape within the group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are coordinate arrays and data arrays differentiated in Zarr? My understanding was that all arrays are the same and it is the matching of dimension_names
on one array with the name of another array that creates the coordinates in xarray.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be broadened a bit to allow any dimension_names
that includes these patterns. That would catch cases where there is also time
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The geo:proj extension specifically scopes to only the spatial dimensions that the CRS applies to (typically 2D: y/x, lat/lon, etc.). Non-spatial dimensions like time, band, or depth are outside the scope of this extension. The pattern matching is designed to identify exactly the spatial dimension pair that corresponds to the CRS, not to handle additional dimensions in the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I just got your comments afterwards. This is not a static pattern that dimension_names must exactly match. The pattern matching rule is a set of possible name that must be found together in dimension_names
to match a possible combination
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this section! I think it makes things clearer. My questions around ordering essentially boil down to: Would it pass if "lon" and "lat" were flipped?
- `temperature/`: `dimension_names: ["time", "lon", "lat"]`
- `precipitation/`: `dimension_names: ["time", "lon", "lat"]`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain what validation needs to happen using the array shape?
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **Error Handling**: If spatial dimensions cannot be identified through either method, implementations MUST raise an error | |
- **Error Handling**: If spatial dimensions cannot be identified through either method, implementations MUST raise an error |
Can you be more specific here? "e.g., implementations MUST raise an error if using operations that rely on the "geo:proj" attribute"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also be useful to specify when implementations should raise the error. Ideally the error raises on lazy open (for instance xr.open_dataset
).
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation for allowing more than one? It adds work for implementations to validate the consistency, so if not needed it seems best to only allow one. I am concerned about what will happen if people ask to add lossy additional options, such as PROJ.4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference:
- Geoparquet uses PROJJSON only (see add support wkt or wkt2 formats for crs opengeospatial/geoparquet#221, Thoughts on PROJJSON for CRS encoding? opengeospatial/geoparquet#90).
- GeoArrow allows different formats but recommends PROJJSON ; however it doesn't allow more than CRS representation (it has two attributes
crs
andcrs_type
) https://geoarrow.org/extension-types.html#extension-metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the crs_type is also interesting, I guess it would simplify implementations
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One point of confusion for me in STAC is whether the bbox refers to the outer bounds of the cells or the coordinate values for raster data? From this, it looks like it's the coordinate values which will always be narrower than the outer bounds of the cells for raster data. Should we specify that in this document?
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should transform and bbox be given a type as part of this document (e.g., double precision float)?
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these examples are for arrays even though it is recommended that this be defined at the group level. Are we sure we want to allow inheritance from the group level?
I very much like @benbovy's suggestion (#21 (comment)) that the "geo:proj" blobs be defined at that group level with an id and then the arrays reference a specific id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still a bit confused with the "dimension array" term you often use. Are they coordinates? In that case, this is out of scope. We want to keep the spec on top of the base Zarr concepts (arrays and shapes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsignell I think I finally understood your point here and I updated the readme to better describe how the spec should interpret at array-level.
Uh oh!
There was an error while loading. Please reload this page.