-
Notifications
You must be signed in to change notification settings - Fork 15
[draft] zarr object models #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
d7aa818
7bd949d
60594f3
1b9eb87
089a2f3
bfd1d0c
8603888
8d75fdd
46e71c4
e1b8755
a89bde6
659a090
20c9cc0
4c81889
3f1d95c
2f221b7
6cb0110
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,248 @@ | ||
--- | ||
layout: default | ||
title: ZEP0006 | ||
description: Defining a Zarr Object Model (ZOM) | ||
parent: draft ZEPs | ||
nav_order: 1 | ||
--- | ||
|
||
# ZEP 6 - A Zarr Object Model | ||
|
||
Authors: | ||
|
||
* Davis Bennett([@d-v-b](https://github.com/d-v-b)) HHMI / Janelia Research Campus | ||
|
||
Status: Draft | ||
|
||
Type: Specification | ||
|
||
Created: 2023-07-20 | ||
|
||
|
||
## Abstract | ||
|
||
This ZEP defines Zarr Object Models, or ZOMs. ZOMs are abstract representations of Zarr hierarchy. The core of a ZOM is a language-independent interface that describes an abstract hierarchy as a tree of nodes. | ||
|
||
The base ZOM defines two types of nodes: arrays and groups. Both types of nodes have an `attrs` property, which is an object with string keys and arbitrary values. The base ZOM does not define the exact properties of arrays, as these properties vary with Zarr versions. Groups have a property called `members`, which is an object with string keys and values that are either arrays or groups. A ZOM can be used by applications as the basis for a declarative, type-safe approach to managing Zarr hierarchies. | ||
|
||
## Definition of hierarchy structure | ||
|
||
This document distinguishes the *structure* of a Zarr hierarchy from the data stored in the hierarchy. The structure of a Zarr hierarchy is the layout of the tree of arrays and groups, and the metadata of those arrays and groups. This definition omits the data stored in the arrays, and the particular storage backend used to store data and metadata. By these definitions, two distinct Zarr hierarchies can have the same structure even if their arrays contain different values, and / or the hierarchies are stored using different storage backends. | ||
|
||
Because the structure of a zarr hierarchy is decoupled, by definition, from the data stored in the hierarchy, it should be possible to represent the structure of a Zarr hierarchy with a compact data structure or interface. Such a data structure or interface would facilitate operations like evaluating whether two Zarr hierarchies are identically structured, evaluating whether a given Zarr hierarchy has a specific structure, or creating a Zarr hierarchy with a desired structure. This document formalizes the Zarr Object Model, an abstract model of the structure of a Zarr hierarchy. The ZOM serves as a foundation for tools that create and manipulate Zarr hierarchies at a structural level. | ||
|
||
## Specification of the base Zarr Object Model | ||
|
||
A node is an object with a property called `attrs` (short for "attributes"), which is a key-value data structure that contains content described as "arbitrary user metadata" in zarr specifications. As of Zarr versions 2 and 3, `attrs` must be a JSON-serializable object. | ||
|
||
The base ZOM defines exactly two types of node: groups and arrays. This definition will use the unqualified terms "array" and "group" to refer to the two nodes defined in the ZOM. Where necessary to avoid ambiguity, the objects *represented* by ZOM arrays and ZOM groups, i.e. Zarr arrays and Zarr groups, will be referred to as "Zarr arrays" and "Zarr groups". | ||
|
||
ZOM arrays and ZOM groups represent Zarr arrays and Zarr groups in the simplest way possible that still conforms to the definition of "node" given above. Thus, a ZOM array is a node with properties identical to those defined in a particular specification of Zarr array metadata, unless one of those Zarr array properties contains user metadata, in which case a ZOM array does not include that property (since user metadata is already represented by the `attrs` property of the array). This definition is parametric with respect to a particular Zarr specification in order to accomodate future versions of Zarr that may add new properties to Zarr arrays. | ||
|
||
Similarly, a ZOM group is a node with properties identical to those defined in a specification of Zarr group metadata, unless one of those properties contains user metadata, in which case a ZOM group does not contain that property, for the same reason given above for arrays. Beyond the properties of Zarr groups defined in a particular Zarr specification, a ZOM group has an additional property: | ||
|
||
- `members`: a key-value data structure where the keys are strings and the values are arrays or groups. This property allows a ZOM group to represent the hierarchical relationship between Zarr groups and the Zarr arrays or Zarr groups contained within them. | ||
|
||
If future versions of Zarr use a property called `members` for some element of Zarr group metadata, then there would be a naming collision between the `members` property of a Zarr group and the `members` property of a ZOM group. In this case, the ZOM group would rename the Zarr group's `members` property to `_members`, and any additional name collisions would be resolved by prepending additional underscore ("_") characters. E.g., in the unlikely case that `members` and `_members` are *both* listed in Zarr group metadata, then the schema group representation would map the `members` property of the Zarr group to a property called `__members`. | ||
|
||
|
||
Thus, ZOM groups and ZOM arrays can represent the structure of a Zarr hierarchy, per the description given in [#definition-of-hierarchy-structure]. | ||
|
||
### ZOM in JSON | ||
|
||
The ZOM representation of a Zarr hierarchy can be easily represented as a JSON object. Here is an example of a ZOM group representing a Zarr group that contains a single two-dimensional Zarr array using Zarr version 2. Both the Zarr group and the Zarr array contain user metadata. | ||
|
||
```json | ||
{ | ||
"zarr_format" : 2, | ||
"attrs": { | ||
"foo" : 10, | ||
"bar" : "hello" | ||
}, | ||
"members": { | ||
"foo": { | ||
"zarr_format" : 2, | ||
"shape" : [10,10], | ||
"chunks": [1,1], | ||
"dtype": "|u1", | ||
"compressor": null, | ||
"fill_value": 0, | ||
"order": "C", | ||
"filters": null, | ||
"attrs" : { | ||
"name": "my cool array" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
The ZOM itself can also be represented as a JSON schema. Here is a the ZOM for Zarr V2 expressed as a JSON schema: | ||
```json | ||
{ | ||
"$ref": "#/definitions/Group", | ||
"definitions": { | ||
"Array": { | ||
"title": "Array", | ||
"description": "Model of a Zarr Version 2 Array", | ||
"type": "object", | ||
"properties": { | ||
"attrs": { | ||
"title": "Attrs", | ||
"type": "object" | ||
}, | ||
"shape": { | ||
"title": "Shape", | ||
"type": "array", | ||
"items": { | ||
"type": "integer" | ||
} | ||
}, | ||
"chunks": { | ||
"title": "Chunks", | ||
"type": "array", | ||
"items": { | ||
"type": "integer" | ||
} | ||
}, | ||
"dtype": { | ||
"title": "Dtype", | ||
"anyOf": [ | ||
{ | ||
"type": "string" | ||
}, | ||
{ | ||
"type": "array", | ||
"items": { | ||
"type": "string" | ||
} | ||
} | ||
] | ||
}, | ||
"compressor": { | ||
"title": "Compressor", | ||
"type": "object" | ||
}, | ||
"fill_value": { | ||
"title": "Fill Value" | ||
}, | ||
"order": { | ||
"title": "Order", | ||
"enum": [ | ||
"C", | ||
"F" | ||
], | ||
"type": "string" | ||
}, | ||
"filters": { | ||
"title": "Filters", | ||
"type": "array", | ||
"items": { | ||
"type": "object" | ||
} | ||
}, | ||
"dimension_separator": { | ||
"title": "Dimension Separator", | ||
"enum": [ | ||
".", | ||
"/" | ||
], | ||
"type": "string" | ||
}, | ||
"zarr_version": { | ||
"title": "Zarr Version", | ||
"default": 2, | ||
"type": "integer" | ||
} | ||
}, | ||
"required": [ | ||
"attrs", | ||
"shape", | ||
"chunks", | ||
"dtype", | ||
"compressor", | ||
"order", | ||
"filters" | ||
], | ||
"additionalProperties": false | ||
}, | ||
"Group": { | ||
"title": "Group", | ||
"description": "Model of a Zarr Version 2 Group", | ||
"type": "object", | ||
"properties": { | ||
"attrs": { | ||
"title": "Attrs", | ||
"type": "object" | ||
}, | ||
"members": { | ||
"title": "Members", | ||
"type": "object", | ||
"additionalProperties": { | ||
"anyOf": [ | ||
{ | ||
"$ref": "#/definitions/Array" | ||
}, | ||
{ | ||
"$ref": "#/definitions/Group" | ||
} | ||
] | ||
} | ||
}, | ||
"zarr_version": { | ||
"title": "Zarr Version", | ||
"default": 2, | ||
"type": "integer" | ||
} | ||
}, | ||
"required": [ | ||
"attrs", | ||
"members" | ||
], | ||
"additionalProperties": false | ||
} | ||
} | ||
} | ||
``` | ||
|
||
And Zarr V3: | ||
|
||
```json | ||
# insert schema for v3 here | ||
|
||
``` | ||
|
||
|
||
## Related Work | ||
|
||
|
||
|
||
## Implementation | ||
|
||
- pydantic zarr | ||
- ? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
(I have an unpublished version that I can share soon) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. zarrita also has attrs classes that define the metadata (minus the new |
||
|
||
## Discussion | ||
|
||
- todo: show that consolidated metadata can be achieved by applying a flattening transformation to a ZOM representation of a hierarchy. | ||
- - The origins of consolidated metadata: | ||
* <https://github.com/pangeo-data/pangeo/issues/309> | ||
* <https://github.com/zarr-developers/zarr-python/pull/268> | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it may also be worth summarizing some of the intended benefits to existing/internal applications. For example, the utilization of a standard data object internally within zarr-python may help improve workflow for creating large hierarchies by allowing users to create the ZOM metadata before passing it to a zarr.creation method. |
||
|
||
## References and Footnotes | ||
|
||
|
||
## License | ||
|
||
<p xmlns:dct="http://purl.org/dc/terms/"> | ||
<a rel="license" | ||
href="http://creativecommons.org/publicdomain/zero/1.0/"> | ||
<img src="https://licensebuttons.net/p/zero/1.0/80x15.png" style="border-style: none;" alt="CC0" /> | ||
</a> | ||
<br /> | ||
To the extent possible under law, | ||
<a rel="dct:publisher" | ||
href="https://github.com/zarr-developers/zeps"> | ||
<span property="dct:title">the authors</span></a> | ||
have waived all copyright and related or neighboring rights to | ||
<span property="dct:title">ZEP 1</span>. | ||
</p> |
Uh oh!
There was an error while loading. Please reload this page.