Skip to content

Conversation

joshmoore
Copy link
Member

@joshmoore joshmoore commented May 16, 2025

This is a follow on to ZEP9 (#65) since #66 limits the scope of ZEP9 solely to phase 1 such that it can be moved to accepted (since zarr-developers/zarr-specs#330 is merged and v3.1 released). This ZEP is equivalent to phase 2 of the original ZEP9 draft and introduces a top-level generic extensions field.

This ZEP will follow the process laid out in ZEP0 and invites votes from the newly refreshed @zarr-developers/implementation-council. This PR may be proactively merged as a draft, but will not be moved to "accepted" until the related PR on zarr-specs is voted on, merged, and v3.2 released.

Please see zarr-developers/zarr-specs#344 for detailed changes.

@alimanfoo
Copy link
Member

Hi @joshmoore, just a process question, it would seem beneficial to get this PR merged asap so it becomes visible as a draft zep on the zeps website. Who needs to approve that, and what checks would need to be done at this stage to allow merging? E.g., does someone just need to check that the document has the right structure for a ZEP? If so, I'd be happy to approve.

@jbms
Copy link

jbms commented May 16, 2025

Hi @joshmoore, just a process question, it would seem beneficial to get this PR merged asap so it becomes visible as a draft zep on the zeps website. Who needs to approve that, and what checks would need to be done at this stage to allow merging? E.g., does someone just need to check that the document has the right structure for a ZEP? If so, I'd be happy to approve.

I know we've done that in the past for ZEPs but then it is actually harder to comment on it --- I'd need to open a separate issue for each comment..

@joshmoore
Copy link
Member Author

@alimanfoo a process question, it would seem beneficial to get this PR merged asap so it becomes visible as a draft zep on the zeps website. Who needs to approve that, and what checks would need to be done at this stage to allow merging? E.g., does someone just need to check that the document has the right structure for a ZEP? If so, I'd be happy to approve.

For merging in the "Draft", yes, that suffices. From https://zarr.dev/zeps/active/ZEP0000.html#submitting-a-zep

"...The Zarr Steering Council and the Zarr Implementations Council will not unreasonably deny publication of a ZEP. Reasons for denying ZEP include duplication of effort, being technically unsound, not providing proper motivation or addressing backwards compatibility, or not taking care of Zarr CODE OF CONDUCT."


@jbms I know we've done that in the past for ZEPs but then it is actually harder to comment on it --- I'd need to open a separate issue for each comment.

I'm certainly all for leaving it open for a bit, especially for the discussion of the material that is only here (as @jbms has done above). I can manage having it open and synchronizing with the specs PR. That being said, if possible, I'd like to get it merged as a "Draft" and then will also keep updating it as necessary to stay in step with discussions on zarr-developers/zarr-specs#344

@d-v-b
Copy link

d-v-b commented May 16, 2025

Hi @joshmoore, just a process question, it would seem beneficial to get this PR merged asap so it becomes visible as a draft zep on the zeps website. Who needs to approve that, and what checks would need to be done at this stage to allow merging? E.g., does someone just need to check that the document has the right structure for a ZEP? If so, I'd be happy to approve.

seconding @jbms, I rate the ability to discuss the ZEP as a single PR much higher than seeing it listed on the ZEP web site, so I would rather we keep this PR open until it's clear that all the questions have been answered.

Comment on lines +106 to +108
Note that in this example of the extension is ``must_understand=true`` meaning
an implementation which does not support the ``example.offset`` extension
should raise an error.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when should that error be raised? when reading metadata, or when reading chunks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the impl doesn't know the example.offset extension, it must fail when parsing the metadata.
It may fail with a out-of-bounds error when reading/writing data outside the domain. But that would be up to the specification for this extension to define.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the impl doesn't know the example.offset extension, it must fail when parsing the metadata.

It seems to me that a zarr-compatible application should be able to say, for example, "this is an array with shape <shape>, but I can't load chunks for you because of <unknown extension>". Your suggesting that the metadata document should be effectively unreadable prevents this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that a zarr-compatible application should be able to say, for example, "this is an array with shape <shape>, but I can't load chunks for you because of <unknown extension>".

I think that would be a good implementation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be a good implementation.

Since the behavior I described relies on reading the metadata without an error, this PR should clarify the distinction between reading metadata documents and other IO operations (e.g., reading chunks, in this example).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are purely displaying information to a user and including a warning that an unknown extension was encountered, then displaying whatever information can be heuristically extracted from the metadata successfully may be reasonable.

In general though if there is an unknown extension, you can't really make any assumptions about the meaning of the metadata and any programmatic use is problematic.

For example, the offset extension may mean that the upper bound of the array is no longer indicated by shape but by offset + shape, and the chunk grid starts at offset rather than (0, ...). Maybe there is some program that partitions zarr arrays according to the chunking and then hands off those zarr arrays to worker processes. If the partition program does not support the offset extension, but the worker program does support the offset extension, then the partition program will perform the partitioning incorrectly, but the worker processes may process them without errors, but not correctly aligned to the chunk grid.

Concretely, I'd say that if there is an unknown must_understand=true extension, zarr.open and similar interfaces should not appear to succeed and allow querying properties like the chunk grid, dtype, etc. unless the user explicitly opts into ignoring unknown extensions.

Copy link

@d-v-b d-v-b May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general though if there is an unknown extension, you can't really make any assumptions about the meaning of the metadata and any programmatic use is problematic.

I find this outcome concerning, as it amounts to fragmenting the zarr ecosystem.

@d-v-b
Copy link

d-v-b commented May 16, 2025

I think this document should explain why the pre-existing attributes field is insufficient for the purposes of this ZEP.

@jbms
Copy link

jbms commented May 16, 2025

I think this document should explain why the pre-existing attributes field is insufficient for the purposes of this ZEP.

For must_understand: true extensions, like specifying the array content inline, transposing the array, etc. an attribute would definitely not work. However, all of the examples given would work as attributes reasonably well.

@LDeakin
Copy link
Member

LDeakin commented Jul 8, 2025

The GeoZarr spec is a domain-specific extension to Zarr

I've only just scanned over it, but GeoZarr looks to be a metadata/layout standard for Zarr (like OME-Zarr), rather than a "Zarr extension". Such standards require no support from implementations like zarr-python, tensorstore, zarrs, etc, and would fit under the banner of ZEP 4 — Metadata Conventions rather than this ZEP.

Is ZEP10 not the appropriate mechanism to use for GeoZarr?

Indeed, it is not. Just use attributes.

@rabernat
Copy link
Contributor

rabernat commented Jul 8, 2025

@LDeakin I'm really not sure it's so cut and dry.

Things like GeoZarr do require support from other, more domain specific implementations like GDAL, Xarray, etc. Having worked on both GeoZarr and ZEP4, I'm not convinced that just saying "put whatever you want in attributes" is the right path forward. There needs to be a way to declare more explicitly that a dataset is conforming to one of these models, and a way to register them centrally.

@geospatial-jeff
Copy link

Things like GeoZarr do require support from other, more domain specific implementations like GDAL, Xarray, etc. Having worked on both GeoZarr and ZEP4, I'm not convinced that just saying "put whatever you want in attributes" is the right path forward. There needs to be a way to declare more explicitly that a dataset is conforming to one of these models, and a way to register them centrally.

Could not agree more! STAC has a domain-agnostic and battle tested extension mechanism that could almost be directly copy/pasted into Zarr. If STAC took a similar approach of saying "just use properties" (the STAC equivalent of Zarr attributes) the spec wouldn't be half of what it is today.

Furthermore, the Zarr spec defines attributes as "Intended to allow storage of arbitrary user metadata." The metadata included in GeoZarr is beyond "arbitrary", it represents essential pieces of information that have been driving interoperability in the geospatial community for ~45 years.

I'm not sure why this is so controversial. A robust extension mechanism that provides 3rd parties the ability to extend the Zarr spec (exactly what ZEP10 describes itself as) doesn't hurt the Zarr community. It's a win-win for everyone involved. A weak extension mechanism (just use attributes) is a lose-lose for everyone involved.

@LDeakin
Copy link
Member

LDeakin commented Jul 8, 2025

Things like GeoZarr do require support from other, more domain specific implementations like GDAL, Xarray, etc.

Of course, just like OME-Zarr. But does GeoZarr need to change the core data model of Zarr? If not, it is well suited to attributes.

GeoZarr could use "extensions" or new top-level fields, but they should almost certainly be annotated with "must_understand": false. Otherwise, implementations like zarr-python, tensorstore, and zarrs will also need to fully understand GeoZarr. Why do they need to be even aware of its existence? Leave that for higher-level downstream implementations, such as Xarray, etc.

There needs to be a way to declare more explicitly that a dataset is conforming to one of these models, and a way to register them centrally.

@jbms brought up the registration of attributes earlier in this thread. Is that not sufficient?

@LDeakin
Copy link
Member

LDeakin commented Jul 8, 2025

A recurring discussion around this ZEP is the ill-defined distinction between attributes / extensions. I am seeing your perspective @geospatial-jeff and @rabernat, and maybe the path forward is to abandon ZEP4 entirely and interpret attributes literally to mean user metadata, as it is currently spec'd.

Anything standards-driven could be registered and part of extensions. But, I'd echo what I said in my last comment, GeoZarr extensions should be "must_understand": false if they are really just a metadata/layout standard.

@geospatial-jeff
Copy link

geospatial-jeff commented Jul 9, 2025

zarr-python, tensorstore, zarrs, and any other Zarr tooling doesn't need to be aware of GeoZarr or any other domain-specific / 3rd party extension. They need to be aware of the extension mechanism, that is all. There are several things that would be needed to make attribute registration work:

  1. attributes is typed (in python) as dict[str, Any]. Extensions that place their data in attributes need stronger typing. The most widely used standard here is JSON Schema. Zarr needs some way to include in each node a list of JSON Schemas that describe the contents of attributes so that readers/writers can validate the contents of extensions contained within the attributes key. I think all extensions should use JSON Schema whether or not they are in extensions / attributes.
  2. As mentioned earlier there probably needs to be some sort of namespacing of keys (ex. geo:) within attributes. This is very reasonable, it's the approach adopted by STAC for similar reasons!
  3. There needs to be a central place to register these extensions / schemas, something like zarr-extensions.
  4. python-zarr, being the most widely used tool for reading and writing zarr stores, needs to understand how to read, write, and validate these extensions against the provided JSON schemas. It does not need to know how to interpret these extensions, or do anything knowledgeable with them.

Generally speaking, I worry about two things here:

  • As mentioned earlier there are backwards compatibility issues with repurposing attributes to hold extensions. Namespaces minimize the chances of collision, but don't remove it entirely.
  • I worry that Zarr is trending in a direction of supporting two separate ways of extending the spec. One that is implemented in a top-level extensions key, and another that is implemented in attributes. This is objectively more complex for tooling and more confusing for users than having a single extension mechanism. The result will be users putting things in extensions that should have gone in attributes and vice-versa. The folks participating in this thread can't even give a straight answer in terms of what goes where, so how could an average user? Why does it matter if an extension is "standards-driven" or "user-driven"? Isn't the purpose of the Zarr standard to help users/communities do interesting things with tensor-like data in the cloud? Why draw a line between the two?

@d-v-b
Copy link

d-v-b commented Jul 9, 2025

attributes is typed (in python) as dict[str, Any]. Extensions that place their data in attributes need stronger typing. The most widely used standard here is JSON Schema. Zarr needs some way to include in each node a list of JSON Schemas that describe the contents of attributes so that readers/writers can validate the contents of extensions contained within the attributes key. I think all extensions should use JSON Schema whether or not they are in extensions / attributes.

this is an implementation detail. The only real requirement is that the type of attributes be assignable to Mapping[str, object], in principle anything more specific would work fine too. While I would like this kind of functionality in zarr-python, I've had more success implementing it over in pydantic-zarr, which is designed for exactly the use case you want. In Pydantic-zarr, the array and group objects are defined like this:

class ArraySpec(Generic[TAttr]):
    attributes: TAttr
    ...

class GroupSpec(Generic[TAttr, TItem]):
    attributes: TAttr
    members: Mapping[str, TItem]

The array object takes the type of its attributes as a type parameter. The group object takes the type of its attributes, and the type of its members, as a type parameter. This allows pydantic to do runtime type checking of the attributes field + group members, and is largely sufficient for "statically typed" zarr hierarchies. For example, we implemented OME-Zarr using this approach.

STAC could use pydantic-zarr to define an attributes model that performs runtime JSONschema validation of attributes elements that conform to a certain structure. I'm new to STAC so maybe there's something even better idea. But nothing needs to change about the spec for this to work.

@joshmoore
Copy link
Member Author

I'll be disappearing for two weeks of wilderness shortly, and apologize now that I'm very unlikely to check in on this thread after this weekend. (I wish everyone at least as much respite.) A few quick responses to the above conversation and from geospatial-jeff/zarr-python#1:

  • The solution using an external json-schema looks elegant but there was a general requirement to not remotely load for the main extension mechanism. I'd ask @geospatial-jeff what that would do to the draft in his PR which I think brings it quite close to the current proposal here.
  • I was never 100% happy with the namings from the ZEP9 work (v3.1), but I think some of the confusion comes from the overloading of the term from the previous version of the spec (v3.0) and even in our daily usage.
    • The delineation I tried to make was that there are "extension points" defined in the core and then "extensions" which you slot into them. (For @geospatial-jeff, we moved the list of implementations away from core since the distinction of "core" was being newly created. There will be a process to move them back into core but first we were trying to unblock the spec work.)
    • ZEP10 adds to the naming confusion because there's yet another "extensions". This was chosen because it was explicitly referenced in previous versions of the spec, but that is perhaps not sufficient.
    • But to clarify what was mentioned elsewhere in different words: ZEP10 is about creating another extension point which takes generic extensions as opposed to the specific ones like "codecs", "data_grid", etc.

From my side, (a) very happy for improved wording both on ZEP9 and ZEP10, (b) it might be as I mentioned previously that a community/spec/ZEP call to talk this through via voice would be easier. Alternatively, I'll try to piece apart the discussions above into separate issues and try to reach consensus on each in turn, including:

  • future of attributes and ZEP4
  • remote loading of schemas
  • top-level keys versus a single container field
  • naming of "extensions" / "extension points"
  • etc.

@jbms
Copy link

jbms commented Jul 30, 2025

The current ZEP10 proposal indeed blurs the distinction between extensions that require changes to the zarr implementation, and extensions like OME-zarr that merely build on top of zarr, and may affect the higher-level interpretation of the array but require no change to the zarr implementation itself and do not affect how the array data itself is read or written.

I would propose instead that we define "registered attributes" also in the zarr-extensions repo, where registered attributes are included in attributes but have a defined schema and are suitably prefixed to avoid any ambiguity in practice with user-defined attributes not intended to conform to any particular spec. These would, like any other attribute, always be safe to ignore by implementations that do not understand them, both when reading and writing, because they must not affect the reading or writing of the array data itself.

For generic extensions that do require changes to the zarr implementation, I'd propose that they just be included as top-level attributes, but with a suitable prefix to avoid conflict with any future additions to the core zarr spec. The must_understand attribute would indicate if they are safe to ignore when reading, and would never be safe to ignore when writing.

@normanrz
Copy link
Member

normanrz commented Aug 1, 2025

Thanks @jbms!

I like the idea of registered attributes that get their space in the zarr-extensions repo. That might bring some backwards incompatibility issues, because we haven't reserved that namespace so far. However, I am not too concerned about that.

I am also onboard with the idea of moving this proposal from an extensions array to prefixed top-level attributes. Here, I wonder how we'll structure the prefixes. Would we have a shared prefix that all registered extensions use, e.g. ext:origin, or custom namespaces?

In any case, I think it would be useful to have a meeting to discuss this and potentially take a decision to move forward.

@joshmoore
Copy link
Member Author

joshmoore commented Aug 6, 2025

This evening during the community meeting, @jbms had the impression that there's generally more interest in the attribute-based metadata as opposed to the extension-based metadata.

In the way of an informal poll for those who are generally looking to make use of the mechanisms discussed here, could you please, add:

  • a 🐱 emoji if you have the desire/need to register your attribute-like metadata (non-must-understand)
  • a 🐶 emoji for extensions (whether prefix based or within the extensions object)
  • or both if that's the case.

Thanks.

Edit: There's no way to just add your own emojis. 🤦🏽

@d-v-b
Copy link

d-v-b commented Aug 9, 2025

The current ZEP10 proposal indeed blurs the distinction between extensions that require changes to the zarr implementation, and extensions like OME-zarr that merely build on top of zarr, and may affect the higher-level interpretation of the array but require no change to the zarr implementation itself and do not affect how the array data itself is read or written.

I would propose instead that we define "registered attributes" also in the zarr-extensions repo, where registered attributes are included in attributes but have a defined schema and are suitably prefixed to avoid any ambiguity in practice with user-defined attributes not intended to conform to any particular spec. These would, like any other attribute, always be safe to ignore by implementations that do not understand them, both when reading and writing, because they must not affect the reading or writing of the array data itself.

For generic extensions that do require changes to the zarr implementation, I'd propose that they just be included as top-level attributes, but with a suitable prefix to avoid conflict with any future additions to the core zarr spec. The must_understand attribute would indicate if they are safe to ignore when reading, and would never be safe to ignore when writing.

💯 , I think taking this approach would resolve a lot of the concerns and questions I raised about this PR.

@rabernat
Copy link
Contributor

rabernat commented Aug 11, 2025

Thanks everyone for weighing in with some productive suggestions. It sounds like we are converging on a path forward for this concept.

I think a synchronous meeting could be very useful to finalize a consensus. I'd love to include at least @joshmoore, @normanrz, @jbms, @d-v-b, @LDeakin, and myself in that meeting. The challenge is that we span the globe almost completely, and finding a comfortable time zone may be very difficult.

https://www.worldtimebuddy.com/?qm=1&lid=2172517,2950159,5128581,5391959&h=5128581&date=2025-8-11&sln=17-18&hf=1

In my opinion, the least painful option looks like this:

  • Berlin: 11pm
  • New York: 5pm
  • San Francisco: 2pm
  • Canberra: 7am (the next day)

(Obviously this is most uncomfortable for Berlin and Canberra.) So my question for the group - could we make this timing work some time this week? I am free any day of this week (Aug 11 - Aug 14) at this time. Alternatively, feel free to propose a different time which you think could work better.

@d-v-b
Copy link

d-v-b commented Aug 12, 2025

thanks for kicking off the calendar-ing @rabernat, I'm also free any day this week at 11:00 pm Berlin time.

@rabernat
Copy link
Contributor

The week has almost passed, so I thin we're looking at next week instead.

Here's a Doodle Poll with some options in this time range the rest of this week and next week. Please fill it out if you're interested in attending: https://doodle.com/group-poll/participate/azlpXj7d

@maxrjones
Copy link
Member

The week has almost passed, so I thin we're looking at next week instead.

Here's a Doodle Poll with some options in this time range the rest of this week and next week. Please fill it out if you're interested in attending: doodle.com/group-poll/participate/azlpXj7d

I filled out the poll but mostly to learn from the discussion so please don't put much weight on my votes.

@rabernat
Copy link
Contributor

rabernat commented Aug 18, 2025

Right now the best options are looking like

  • AUG 19 TUE, 5:00 PM - 6:00 PM ET
  • AUG 20 WED 5:00 PM - 6:00 PM ET

@jbms, @joshmoore, & @normanrz - would you be available to join at either time?

@joshmoore
Copy link
Member Author

Preference for Wed but I'll make either work.

@mkitti
Copy link

mkitti commented Aug 19, 2025

I would like to observe.

@normanrz
Copy link
Member

I am traveling this week and likely won't be able to join.

@LDeakin
Copy link
Member

LDeakin commented Aug 20, 2025

If anyone is interested, I put together some slides that summarise the generic extension options I've seen discussed + my view on the pros/cons of each.

@jbms
Copy link

jbms commented Aug 20, 2025

Thanks for that writeup @LDeakin

I have one new proposal for must_understand:

Supposing that we agree that "must understand for writing" is always implicitly true, and that we only need to represent the single bit "must understand for reading", we could say that properties prefixed with an underscore are not required for reading.

For example:

{
  "zarr_format": 3,
  ...
  "_ext:consolidated_metadata": ...
}

or for a codec

{
  "name": "jpeg",
  "configuration": {
    "_quality": 80
  }
}

Pros:

  • Extremely concise
  • Works equally well for any JSON value type, not just objects
  • Works well for adding new properties to a codec or other existing metadata objects
  • Because it is so concise, we can apply it consistently to all new metadata objects that are defined, e.g. all properties of newly-defined codecs that are only relevant for encoding can be prefixed with an underscore.
  • Compatible with older zarr implementations, in that it will prevent both reading and writing by older zarr implementations, which is the best that we can do since older implementations don't know about the read vs write distinction.

Cons:

  • Introduces an inconsistency with existing gzip/blosc/zstd codec properties like level that, under this new scheme, should have been named _level, and with the top-level dimension_names property that could have been named _dimension_names. However, these would also be inconsistent with explicit must_understand or must_understand_for_reading properties.
  • Introduces an inconsistency with the existing must_understand: false which will still need to be supported for compatibility (at least theoretically, unclear how much breakage there would be in practice).
  • If, for a given extension or new property, the must_understand_for_reading bit is not fixed but may depend on the value of the property, then the underscore prefix, while still workable, is a bit more awkward because now the same property could have two different names. I don't have an example of such a use case, though.

@rabernat
Copy link
Contributor

rabernat commented Aug 25, 2025

Last week a group of us met to discuss the concept of generic extensions. By focusing on specific use cases for this extension mechanism, we reached a somewhat surprising conclusion: we may not need ZEP10-style extensions right now. Instead, nearly all of the use cases we had in mind might conceivably be implemented using existing extension points OR user-level attributes (plus the “registered attributes” concept described above). In the latter case, the data are ultimately readable by simple Zarr implementations without any knowledge of the “extension.” Some example use cases we discussed were:

  • Unevenly chunked arrays: implement as a chunk_grid extension
  • Multiscale arrays: implement as a group with multiple arrays, plus suitable group-level attributes
  • Subarrays (one array that is related to another array, like a parent grid): implement as siblings within a group, plus suitable group-level attributes. This is what Xarray does today when using Zarr.
  • Chunk statistics (e.g. storing min, max, null-count per chunk): implement as a sidecar array.

This leaves the question of consolidated metadata, which is currently implemented in Zarr Python but completely out of spec for Zarr 3. Our proposal for this is:

  • Move consolidated metadata from a top-level metadata field to a [registered] group-level attribute

Most of the remaining questions about how these extensions should work are ultimately about consistency. Example of consistency-related questions are:

  • What if you update / delete one array of a multiscale array? What should happen to the other arrays?
  • What if you update an array with chunk statistics? Where / when do the statistics get updated.
  • For consolidated metadata, what happens if you change the hierarchy? Where / when does the consolidated metadata get updated?

Ultimately, we concluded that specs (including specs for registered attributes which enable behaviours such as described above) should simply try to describe the valid state of data at rest, and not attempt to resolve every consistency issue that might arise in getting into that state. This is consistent with usage today; in practice, most Zarr users today already assume there are relationships between multiple arrays in a group defined by node names and attributes (e.g. Xarray’s coordinate variables), without the spec having to explicitly spell out how updates should work. For applications that require strong consistency for updates across multiple nodes, there are already implementations which offer transactions, such as Icechunk and Tensorstore.

Of course there are many remaining questions about how these features should work, but we are determined to start experimenting with implementations which simply use attributes to define new behaviors, rather than introducing a new, hypothetical extension mechanism at this time.

We can leave this ZEP open for whenever someone comes along who absolutely does need this mechanism for their use case and is motivated to continue work on it.

@normanrz
Copy link
Member

Thanks for the summary @rabernat!

I guess the next steps would be to write up a (small) ZEP to define the "registered attributes" and, in parallel, prepare the zarr-extensions repo for registered attributes.

@maxrjones
Copy link
Member

maxrjones commented Aug 26, 2025

Thanks for the summary @rabernat!

I guess the next steps would be to write up a (small) ZEP to define the "registered attributes" and, in parallel, prepare the zarr-extensions repo for registered attributes.

Agreed, thanks @rabernat! IIRC one motivation for pivoting towards "registered attributes" was to promote an implementation first approach, such that the registered attributes ZEP could be accompanied by one or more concrete examples based on the proposed use-cases. Would anyone from the steering council have time to lead the process for one of the example use-cases, such that the broader community could follow that successful model for the other use-cases? FWIW I am most interested in the unevenly chunked arrays and multiscales arrays and would have time to contribute to that effort. I'm also glad to help design the Zarr summit to enable this work.

@d-v-b
Copy link

d-v-b commented Aug 26, 2025

FWIW I am most interested in the unevenly chunked arrays

I don't think unevenly chunked arrays would be best expressed as a registered attribute. Rather, the chunk_grid field is extensible by definition, and we already have a place for defining new chunk grid specs

@normanrz
Copy link
Member

I think a natural first registered attribute would be ome, which is already in-use today. I would imagine the process to be:

  • Create a folder for attributes on zarr-extensions
  • Create an entry for ome that essentially links to the existing spec

Other attributes might want to document their spec directly in the zarr-extensions repo.
The ome spec contains multiscale arrays, however, not in a general-purpose way. Extracting that would be a separate piece of work.

@rabernat
Copy link
Contributor

I think a natural first registered attribute would be ome, which is already in-use today.

Big 👍 to this. I really like how OME puts all its metadata under a single top-level attribute. I sometimes wish we had decided to do that for Xarray.

The ome spec contains multiscale arrays, however, not in a general-purpose way.

This is a big challenge with a potentially big payoff--refactoring stuff like multiscales, units, etc to be composable rather than part of a big monolithic standard like CF or OME.

@rabernat
Copy link
Contributor

rabernat commented Sep 5, 2025

I guess the next steps would be to write up a (small) ZEP to define the "registered attributes" and, in parallel, prepare the zarr-extensions repo for registered attributes.

FWIW, I think we should not do any ZEP work until we have this actually working.

@joshmoore
Copy link
Member Author

An addendum comment from the Zoom conversation: I'm not in favor of leaving this PR open indefinitely. Instead, I will offer (when time permits) to add a some changes and then merge it, marking it "withdrawn" or "rejected". When and if we come back to top-level extensions, I'd then suggest that we start with a new ZEP with the lessons learned from the attributes work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.