-
-
Notifications
You must be signed in to change notification settings - Fork 365
[v3] Hierarchy api #1912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[v3] Hierarchy api #1912
Conversation
…nto hierarchy_api
|
|
||
| @classmethod | ||
| def from_dict( | ||
| async def from_dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a notable change that was needed to get the hierarchy API to work. Previously, from_dict was not async, but it should be.
| return Array.from_dict(store_path=store_path, data=self.to_dict()) | ||
|
|
||
| @classmethod | ||
| def from_array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a convenience method that makes it less painful to create ArrayModel instances, because it uses as many defaults / inferred values as possible.
| codecs: Iterable[Codec | JSON], | ||
| attributes: None | dict[str, JSON], | ||
| dimension_names: None | Iterable[str], | ||
| codecs: Iterable[Codec | JSON] = (BytesCodec(),), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some default values here to make array creation easier. happy to revert if this is controversial.
This PR adds a declarative API for defining Zarr arrays and groups independently of storage. Using this API, users and developers can create and manipulate Zarr hierarchies, adding nodes and modifying their attributes, and serialize the hierarchy to storage with a single method call.
Implementation
This PR adds a module called
hierarchy.pythat contains two classes,ArrayModelandGroupModel, which model Zarr arrays and groups, respectively. "Model" here is an important concept;ArrayModelhas all the array metadata attributes likeshapeanddtype, butArrayModelhas no connection to storage, or chunks, so you can't useArrayModelto read and write array data. Similarly forGroupModel-- it has all the static attributes of a Zarr group, but no connection to storage, so you cannot access sub-groups or sub-arrays with aGroupModel. (You can, however, access sub-GroupModel and sub-ArrayModel instances, but these are just models). The classes are pretty simple, so I will just paste the current code here:Goals
zarr.jsonmetadata documents in a large hierarchy, which should vastly speed up these interactions on high latency storagedict[str_that_obeys_path_semantics, ArrayModel | GroupModel]. This has been useful over inpydantic-zarrfor a variety of things, and I think it would be useful here. It could also provide a serialization format for consolidated metadata in zarr v3, which so far has not been defined.Process
Unlike a lot of other v3 efforts, this PR adds new functionality that was never in
zarr-pythonbefore. I'm basing the design here on work I did over inpydantic-zarr, so there's some of prior art, but I am happy to explore and experiment as needed. It might take a while before we have an API everyone is happy with.