Skip to content

Commit cf2684c

Browse files
Merge branch 'zarr-developers:main' into main
2 parents 9c07016 + e6916c0 commit cf2684c

File tree

7 files changed

+314
-123
lines changed

7 files changed

+314
-123
lines changed
255 KB
Loading
175 KB
Loading

draft/ZEP0001.md

Lines changed: 71 additions & 118 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ v2 have surfaced. These include:
6969
stability and innovation, some framework for community exploration
7070
and development of Zarr extensions is needed, so that innovation can
7171
happen in a coordinated way with predictable consequences and
72-
behaviour of implementations which may not support all extensions.
72+
behaviour of implementations which may not support all extension points.
7373

7474
* **High-latency storage**. Zarr v2 was originally developed to support
7575
local file system storage. Because of this, the design of Zarr v2
@@ -156,6 +156,9 @@ Implementations of the Zarr version 3 core specification are not
156156
required to be able to read or write data conforming to the Zarr
157157
version 2 specification, and vice versa.
158158

159+
Zarr v2 datasets may be upgraded by only writing v3 metadata if their
160+
data-types and codecs are supported in v3.
161+
159162

160163
## Detailed description
161164

@@ -170,32 +173,27 @@ specifications section.
170173
This ZEP proposes a modular specification framework, with the following components:
171174

172175
* **[Zarr core
173-
specification](https://zarr-specs.readthedocs.io/en/latest/core/v3.0.html)**
174-
-- This specification is the starting point for any Zarr
175-
implementation. It defines in an abstract way a format for storing
176-
N-dimensional array data.
177-
178-
* **Zarr extension specifications** -- This is an open-ended set of
179-
specifications which add new features to and/or modify the Zarr
180-
format in some way. There is a further breakdown of different
181-
extension types, described in more detail below.
176+
specification](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html)**
177+
-- This specification is the starting point for any Zarr implementation. It
178+
defines in an abstract way a format for storing N-dimensional array data.
182179

183-
* **Zarr codec specifications** -- This is an open-ended set of
184-
specifications, each of which defines a protocol for encoding and
185-
decoding Zarr chunk data.
180+
* **Zarr extension point specifications** -- This is an open-ended set of
181+
specifications which add new features to and/or modify the Zarr format in some
182+
way. There is a further breakdown of different extension types, described in
183+
more detail below, covering data types, codecs, and storage transformers.
186184

187185
* **Zarr store specifications** -- This is an open-ended set of
188186
specifications, each of which defines a mapping from the abstract
189187
store API defined in the Zarr core specification to a set of
190188
concrete operations for a specific storage technology, such as a
191-
POSIX file system or cloud object storage.
189+
file system or cloud object storage.
192190

193191
The primary goal of this modularity is to allow specifications of new
194192
extensions, codecs and stores to be added over time, without needing
195193
any change to the core specification.
196194

197195
Note that the core specification allows for decentralised publishing
198-
of extension, codec and store specifications. In other words, any
196+
of extension point and store specifications. In other words, any
199197
individual or organisation may publish such a specification. It is
200198
hoped that the majority of such specifications will be published on
201199
the zarr-specs website after a review process by the Zarr community,
@@ -211,36 +209,36 @@ mean specifications that define conventions for the use of Zarr for a
211209
particular type of data, such as restrictions and expectations
212210
regarding the groups, arrays and attributes that will be found within
213211
a Zarr dataset. How these usage convention specifications will be
214-
managed, published and used is out of scope for this ZEP.
212+
managed, published and used is out of scope for this ZEP (see
213+
[ZEP 4](https://github.com/zarr-developers/zeps/pull/28) instead).
215214

216215
A draft of the Zarr version 3 core specification is available at the
217216
following URL:
218217

219-
* https://zarr-specs.readthedocs.io/en/latest/core/v3.0.html
218+
<https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html>
219+
220+
Changes during the draft process of the spec and this ZEP are listed in the
221+
[change log](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#change-log).
220222

221223
Note publication of the draft specification on the zarr-specs website
222224
preceded establishiment of the ZEP process, and does not imply
223225
acceptance of this ZEP. The specification should be considered a
224226
proposal at this time, pending acceptance of this ZEP.
225227

226-
To illustrate the principle of the other specification types, the
227-
following specifications have also been published on the zarr-specs
228-
website:
228+
Together with the core specification, this ZEP includes
229229

230-
* [Codecs](https://zarr-specs.readthedocs.io/en/latest/codecs.html)
231-
(part of this ZEP as well)
230+
* [Codecs](https://zarr-specs.readthedocs.io/en/latest/v3/codecs.html)
231+
(blosc, gzip, endian, transpose)
232232

233233
* Stores - [File system
234-
store](https://zarr-specs.readthedocs.io/en/latest/stores/filesystem/v1.0.html)
235-
(not part of this ZEP)
234+
store](https://zarr-specs.readthedocs.io/en/latest/v3/stores/filesystem/v1.0.html)
236235

237-
* Data types - [Datetime data
238-
types](https://zarr-specs.readthedocs.io/en/latest/extensions/data-types/datetime/v1.0.html)
239-
(not part of this ZEP)
240236

241-
* Storage transformers - [Sharding storage
242-
transformer](https://zarr-specs.readthedocs.io/en/latest/extensions/storage-transformers/sharding/v1.0.html)
243-
(part of [ZEP2](https://zarr.dev/zeps/draft/ZEP0002.html))
237+
To illustrate the principle of the other specification types, the
238+
[sharding storage transformer](https://zarr-specs.readthedocs.io/en/latest/v3/array-storage-transformers/sharding/v1.0.html)
239+
specification has also been published on the zarr-specs website, but is still
240+
work in progress and not part of this ZEP, but part of
241+
[ZEP2](https://zarr.dev/zeps/draft/ZEP0002.html).
244242

245243

246244
### Extensibility
@@ -268,27 +266,8 @@ specification describes the situations under which the extension can
268266
be ignored and processing can proceed, and conversely where the
269267
extension cannot be ignored and processing must terminate.
270268

271-
A further grouping of different kinds of extension is defined:
272-
273-
* **Generic extensions** -- These are extensions which apply to the
274-
processing of any data within a Zarr hierarchy.
275-
276-
* **Array extensions** -- These are extensions which apply to
277-
processing of an array within a Zarr hierarchy.
278-
279-
* **Data type extensions** -- These are extensions which define a new
280-
data type for array items, where the data type is not included in
281-
the set of core data types.
282-
283-
* **Chunk grid extensions** -- These are extensions which define a new
284-
way of dividing array items into chunks.
285-
286-
* **Chunk memory layout extensions** -- These are extensions which
287-
define a new way of organising the data from an array chunk into a
288-
contiguous sequence of bytes.
289-
290-
* **Storage transformer extensions** -- These are extensions which
291-
define a new storage transformation.
269+
The different kind of extension points are defined in the core spec:
270+
<https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#extension-points>
292271

293272
For each of these kinds of extensions, the core specification defines
294273
a mechanism by which the extension is declared in the metadata
@@ -316,9 +295,6 @@ been reduced relative to Zarr v2. Note in particular the following:
316295
supporting variable length data types, and defined within a data
317296
type extension specification.
318297

319-
* Support for filters has been removed from the core specification. It
320-
is proposed this be covered in an extension specification.
321-
322298

323299
### Accommodating high-latency storage
324300

@@ -357,24 +333,13 @@ objects with a given prefix, which can be used to reduce the size of
357333
the result set and usually returns much faster than listing an entire
358334
bucket. It also provides an opportunity to better leverage the support
359335
for emulating a directory listing, via providing both a "prefix" and a
360-
"delimiter" parameter.
361-
362-
The new design separates the metadata and chunk data storage objects
363-
via a different key prefix. All metadata objects have the key prefix
364-
"/meta" and all chunk data objects have the key prefix "/data". This
365-
allows a faster listing of all metadata keys within a given store. The
366-
list of metadata keys is then sufficient to provide some useful
367-
information to a user regarding the contents and structure of a Zarr
368-
hierarchy.
336+
"delimiter" parameter. The new design separates the metadata and chunk
337+
data storage objects by adding a prefix to chunk data.
369338

370339
The new design also changes the key suffix for metadata objects. For
371-
example, for an array with hierarchy path "/foo/bar", the key for the
372-
corresponding metadata object in Zarr v2 would be
373-
"/foo/bar/.zarray". In Zarr v3 this becomes
374-
"/meta/root/foo/bar.array.json". This change means that all the names
375-
and types (array or group) of children of the "/foo" group can be
376-
discovered and listed via a single storage operation to list the
377-
objects with prefix "/meta/root/foo" and delimiter "/".
340+
example, for an array with hierarchy path `/foo/bar`, the key for the
341+
corresponding metadata object in Zarr v2 would be `/foo/bar/.zarray`.
342+
In Zarr v3 this becomes `/foo/bar/zarr.json`.
378343

379344

380345
### Storage transformers
@@ -395,7 +360,7 @@ original state whenever data is requested. To use multiple storage transformers,
395360
those may be stacked to combine different functionalities.
396361

397362
One example extension that would be implemented as a storage transformer is the
398-
[sharding specification](https://zarr-specs.readthedocs.io/en/latest/extensions/storage-transformers/sharding/v1.0.html).
363+
[sharding specification](https://zarr-specs.readthedocs.io/en/latest/v3/array-storage-transformers/sharding/v1.0.html).
399364
Besides this, the following (non-exhaustive) list of concepts that might possibly be
400365
implemented as storage transformers:
401366

@@ -421,82 +386,70 @@ Related work for sharding (motivation for storage transformers):
421386

422387
## Implementation
423388

424-
The following projects contains the implementation of Zarr Version 3 protocol:
425-
426-
- zarrita: https://github.com/alimanfoo/zarrita
427-
428-
> Zarrita is a minimal, exploratory implementation of the [Zarr version
429-
3.0 core protocol](https://zarr-specs.readthedocs.io/en/latest/core/v3.0.html#zarr-core-specification-v3-0).
430-
This is a technical spike only, not for production use.
431-
432-
- xtensor: https://github.com/xtensor-stack/xtensor-zarr
433-
434-
> `xtensor-zarr` offers an API to create and access a Zarr (v2 or v3)
435-
hierarchy in a store (locally or in the cloud), read and write
436-
arrays (in various formats) and groups in the hierarchy, and explore
437-
the hierarchy.
389+
Support for Zarr v3 in `zarr-python` is in progress, updating to the current
390+
spec is tracked in <https://github.com/zarr-developers/zarr-python/issues/1290>.
438391

439392

440393
## Discussion
441394

442395
1. Discussions around storage transformers:
443-
* https://github.com/zarr-developers/zarr-specs/pull/134
396+
* <https://github.com/zarr-developers/zarr-specs/pull/134>
444397

445398
2. Discussions around sharding:
446399
* For the specification:
447-
* https://github.com/zarr-developers/zarr-specs/issues/127
448-
* https://github.com/zarr-developers/zarr-specs/pull/134
400+
* <https://github.com/zarr-developers/zarr-specs/issues/127>
401+
* <https://github.com/zarr-developers/zarr-specs/pull/134>
449402
* Initial issue in `zarr-python`:
450-
* https://github.com/zarr-developers/zarr-python/issues/877
403+
* <https://github.com/zarr-developers/zarr-python/issues/877>
451404
* Different prototype implementations:
452-
* https://github.com/alimanfoo/zarrita/pull/40
453-
* https://github.com/zarr-developers/zarr-python/pull/876
454-
* https://github.com/zarr-developers/zarr-python/pull/947
405+
* <https://github.com/alimanfoo/zarrita/pull/40>
406+
* <https://github.com/zarr-developers/zarr-python/pull/876>
407+
* <https://github.com/zarr-developers/zarr-python/pull/947>
455408
* Other related discussions:
456-
* https://forum.image.sc/t/ome-zarr-chunking-questions/66794
457-
* https://forum.image.sc/t/sharding-support-in-ome-zarr/55409
458-
* https://forum.image.sc/t/deciding-on-optimal-chunk-size/63023
459-
* https://github.com/thewtex/shardedstore/issues/17
409+
* <https://forum.image.sc/t/ome-zarr-chunking-questions/66794>
410+
* <https://forum.image.sc/t/sharding-support-in-ome-zarr/55409>
411+
* <https://forum.image.sc/t/deciding-on-optimal-chunk-size/63023>
412+
* <https://github.com/thewtex/shardedstore/issues/17>
460413

461414
3. Discussion around Zarr V3 Protocol:
462415
* Issues in `zarr-specs`:
463416
* ZEP1 project board:
464-
* https://github.com/orgs/zarr-developers/projects/2
417+
* <https://github.com/orgs/zarr-developers/projects/2>
465418
* Zarr V3 Mission:
466-
* https://github.com/zarr-developers/zarr-specs/issues/140
419+
* <https://github.com/zarr-developers/zarr-specs/issues/140>
467420
* Zarr V3 Implementation(Zarrita):
468-
* https://github.com/zarr-developers/zarr-specs/issues/84
421+
* <https://github.com/zarr-developers/zarr-specs/issues/84>
469422
* Content-addressable Storage Transformer for Zarr V3:
470-
* https://github.com/zarr-developers/zarr-specs/issues/82
423+
* <https://github.com/zarr-developers/zarr-specs/issues/82>
471424
* Additional discussions related to Zarr V3:
472-
* https://github.com/zarr-developers/zarr-specs/issues/53
473-
* https://github.com/zarr-developers/zarr-specs/issues/13
425+
* <https://github.com/zarr-developers/zarr-specs/issues/53>
426+
* <https://github.com/zarr-developers/zarr-specs/issues/13>
474427
* PRs in `zarr-specs`:
475428
* Zarr V3 Protocol Development Branch:
476-
* https://github.com/zarr-developers/zarr-specs/pull/16
429+
* <https://github.com/zarr-developers/zarr-specs/pull/16>
477430
* Zarr V3 Conceptual Model, Data Types, Chunks layouts, Codecs and
478431
other important changes:
479-
* https://github.com/zarr-developers/zarr-specs/pull/17
480-
* https://github.com/zarr-developers/zarr-specs/pull/18
481-
* https://github.com/zarr-developers/zarr-specs/pull/22
482-
* https://github.com/zarr-developers/zarr-specs/pull/24
483-
* https://github.com/zarr-developers/zarr-specs/pull/25
484-
* https://github.com/zarr-developers/zarr-specs/pull/27
485-
* https://github.com/zarr-developers/zarr-specs/pull/28
486-
* https://github.com/zarr-developers/zarr-specs/pull/29
487-
* https://github.com/zarr-developers/zarr-specs/pull/30
488-
* https://github.com/zarr-developers/zarr-specs/pull/32
489-
* https://github.com/zarr-developers/zarr-specs/pull/35
432+
* <https://github.com/zarr-developers/zarr-specs/pull/17>
433+
* <https://github.com/zarr-developers/zarr-specs/pull/18>
434+
* <https://github.com/zarr-developers/zarr-specs/pull/22>
435+
* <https://github.com/zarr-developers/zarr-specs/pull/24>
436+
* <https://github.com/zarr-developers/zarr-specs/pull/25>
437+
* <https://github.com/zarr-developers/zarr-specs/pull/27>
438+
* <https://github.com/zarr-developers/zarr-specs/pull/28>
439+
* <https://github.com/zarr-developers/zarr-specs/pull/29>
440+
* <https://github.com/zarr-developers/zarr-specs/pull/30>
441+
* <https://github.com/zarr-developers/zarr-specs/pull/32>
442+
* <https://github.com/zarr-developers/zarr-specs/pull/35>
490443
* Additional changes to Zarr V3 Protocol:
491-
* https://github.com/zarr-developers/zarr-specs/pull/54
492-
* https://github.com/zarr-developers/zarr-specs/pull/143
493-
* https://github.com/zarr-developers/zarr-specs/pull/146
444+
* <https://github.com/zarr-developers/zarr-specs/pull/54>
445+
* <https://github.com/zarr-developers/zarr-specs/pull/143>
446+
* <https://github.com/zarr-developers/zarr-specs/pull/146>
494447

495448

496449
## References and Footnotes
497450

498451
* <a name="ref-ZARR2SPEC"></a> [ZARR2SPEC] -
499-
https://zarr.readthedocs.io/en/stable/spec/v2.html
452+
<https://zarr.readthedocs.io/en/stable/spec/v2.html>
500453

501454

502455
## License

draft/ZEP0002.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -161,11 +161,7 @@ specification.
161161

162162
Sharding can be configured per array in the array metadata by specifying the
163163
following array metadata keys:
164-
* `extension` must always be
165-
`"https://purl.org/zarr/spec/storage_transformers/sharding/1.0"` for this
166-
extension.
167-
* `type` specifies a binary shard format. In this version, the only binary
168-
format is the `indexed` format.
164+
* `name` must always be `"sharding` for this storage transformer.
169165
* `configuration` contains only the following configuration key:
170166
* `chunks_per_shard` is an array of integers providing the number of chunks
171167
that are combined in a shard for each dimension of the Zarr array, where

draft/ZEP0003.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ Type: Specification
1818

1919
Created: 2022-10-17
2020

21+
Discussion: https://github.com/orgs/zarr-developers/discussions/52
22+
2123
## Abstract
2224

2325
To allow the chunks of a zarr array to be rectangular grid rather than a regular grid,

0 commit comments

Comments
 (0)