@@ -23,12 +23,14 @@ is licensed under a `Creative Commons Attribution 3.0 Unported License
2323
2424----
2525
26+
2627Abstract
2728========
2829
2930This specification defines the Zarr core protocol for storage and
3031retrieval of N-dimensional typed arrays.
3132
33+
3234Status of this document
3335=======================
3436
@@ -43,7 +45,6 @@ This document was produced by the `Zarr core development team
4345<https://github.com/orgs/zarr-developers/teams/core-devs> `_.
4446
4547
46- <<<<<<< HEAD
4748Introduction
4849============
4950
@@ -120,80 +121,10 @@ languages. Additional functionality can then be layered via
120121extensions, some of which may aim for wide adoption, some of which may
121122be more specialised and have more limited implementation.
122123
123- =======
124- Zarr spec v2 was originally designed around local filesystem, but Zarr has
125- grown and is now regularly deployed on cloud / object storage. Those kind of
126- storage have characteristics, capabilities and usage patterns that can widely
127- differ from the assumptions of spec v2. V3 is designed to consider online
128- stores, in particular we want to achieve the following:
129-
130- - No assumption that the underlying store has locking ability.
131- - Ability to do concurrent writes with the assumption that writes from clients will be consistent, but not atomic.
132-
133- Unlike Zarr spec v2, the spec v3 has mainly the following differences:
134-
135- - V3 is a flat key-value store instead of a hierarchical store. Hierarchy is implied.
136- - V3 has an explicit root, while v2 roots and groups could not be distinguished.
137- - Separation of the data and metadata key space.
138- - Explicit support for extensions.
139- - chunk separator is ``/ `` by default.
140- - `".json" ` suffix for the metadata document by default.
141-
142- This means that a store cannot be opened at an arbitrary point, but needs to be
143- opened at the root. User facing convenience functions could walk a given
144- hierarchy and return a sub-group, but this is not part of the API.
145-
146- Goal and non-goal of v3 spec with respect to v2 spec
147- ====================================================
148-
149- This section is informative and is present to help the reader familiar with
150- previous version of zarr to find and understand the differences and the reasons
151- behind them as well as guide the contributor during the draft and review
152- period.
153-
154- Better suitability for HPC file systems and network stores
155- ----------------------------------------------------------
156-
157- One goal of the spec v3 is to have a design that minimizes the number of
158- round-trip operations that must be done in order to understand the structure of
159- a Zarr store. Especially on highly parallel file systems and network stores,
160- listing keys and accessing metadata can be an expensive – high latency
161- – operation. Thus a nested hierarchy listing all available groups, datasets
162- and chunks can be a time consuming operation.
163-
164- The v3 spec tries to separate the metadata from group and dataset data
165- using a prefix, as well as recommend a flatter way of storing keys in order to
166- facilitate bulk operations. This should in particular allow to decrease the
167- reliance on "metadata consolidation" seen with Zarr v2.
168-
169- Another related change is the notion of implicit groups created when a dataset
170- or chunk can be written via its full path even when the intermediate groups do
171- not exist. This allows lock-free write operations for non-contending
172- applications without the need for extra operations and round trips to create or
173- check the existence of intermediate groups.
174-
175- Consideration of multiple programming languages
176- -----------------------------------------------
177-
178- Zarr spec v3 has an explicit goal of having better compatibility and easier
179- implementation with programming languages other then Python. Thus a number of
180- core features in previous spec have been relegated to extensions for the time
181- being. This includes in particular a reduction of the number of data types that
182- are available in core.
183-
184- Compatibility with the N5 project
185- ---------------------------------
186-
187- The `N5 project <https://github.com/saalfeldlab/n5 >`_ and Zarr have similar
188- goals. One of the goal of Zarr spec v3 is to provide compatibility for most of
189- Zarr v2 and N5 users in order to allow consolidation under the v3 spec with the
190- end goal of merging the two projects.
191- >>>>>>> Some changes
192124
193125Extensibility
194126-------------
195127
196- <<<<<<< HEAD
197128The development of systems for storage of very large array-like data
198129is a very active area of research and development, and there are many
199130possibilities that remain to be explored. A goal of this specification
@@ -205,14 +136,7 @@ where possible, in order to retain interoperability. We also aim to
205136provide a framework for community-defined extensions, which can be
206137developed and published independently without requiring centralised
207138coordination of all specifications.
208- =======
209- One of the non-goal of Zarr spec v3 is to cover all use cases in the core, and
210- to provide a path forward for extensibility and future standardisation of
211- extensions without the need to rely on the Zarr core team. A challenge is to
212- make sure implementations of the Zarr protocol for which used extensions are not
213- available can still give user access to data without triggering corruption when
214- possible.
215- >>>>>>> Some changes
139+
216140
217141Questions that still need to be resolved
218142----------------------------------------
@@ -223,28 +147,29 @@ draft.
223147 - https://github.com/zarr-developers/zarr-specs/issues/72 to potentially split
224148 large metadata documents.
225149 - extensions and ``must_understand = True `` might be too restrictive. Work a
226- draft implementation with extensions and see how far we can go. List of
227- extensions to implement:
150+ draft implementation with extensions and
151+ see how far we can go. List of extensions to implement:
228152
229153 - Boolean
230154 - Complex
231155 - Datetime
232156 - Named dimensions
233157 - Awkward arrays
234158
235- See https://github.com/zarr-developers/zarr-specs/issues/89 for a discussion
236- on the topic.
159+ See https://github.com/zarr-developers/zarr-specs/issues/89 for discussion on
160+ the topic.
161+
162+ - Node name case sensitivity: The node name is now case sensitive, this may
163+ make store implementation more complicated as backed might not be (like some
164+ specific filesystem / object store), and we may want to recommend a standard
165+ escaping mechanism in those case. https://github.com/zarr-developers/zarr-specs/issues/57
237166
238- - Node name case sensitivity: the node name is now case sensitive, this may
239- make store implementation more complicated as backed might not be (like some
240- specific filesystem / object store), and we may want to recommend a standard
241- escaping mechanism in those cases. https://github.com/zarr-developers/zarr-specs/issues/57
167+ - Node name character set: Same as above but unlike the previous point where we
168+ solicit feedback on wither store implementation should support full unicode.
169+ https://github.com/zarr-developers/zarr-specs/issues/56
242170
243- - Node name character set: same as above but unlike the previous point where we
244- solicit feedback on whether store implementation should support full unicode.
245- https://github.com/zarr-developers/zarr-specs/issues/56
171+ - Should named dimensions be part of the core metadata spec ? https://github.com/zarr-developers/zarr-specs/issues/73
246172
247- - Should named dimensions be part of the core metadata spec ? https://github.com/zarr-developers/zarr-specs/issues/73
248173
249174Document conventions
250175====================
@@ -261,6 +186,7 @@ All of the text of this specification is normative except sections
261186explicitly marked as non-normative, examples, and notes. Examples in
262187this specification are introduced with the words "for example".
263188
189+
264190Concepts and terminology
265191========================
266192
@@ -351,7 +277,7 @@ conceptual model underpinning the Zarr protocol.
351277
352278 An array _ contains zero or more elements. Each element can be
353279 identified by a tuple of integer coordinates, one for each
354- dimension _ of the array _. If all dimensions _ of an array _ have a
280+ dimension _ of the array _. If all dimensions _ of an array _ have
355281 finite length, then the number of elements in the array _ is given
356282 by the product of the dimension _ lengths. An array _ may not have
357283 been fully initialized.
@@ -456,6 +382,7 @@ conceptual model underpinning the Zarr protocol.
456382 interface `_ which is a common set of operations that stores may
457383 provide.
458384
385+
459386Node names
460387==========
461388
@@ -485,6 +412,7 @@ identical.
485412 this would entail. If you have experience or views please comment on
486413 `issue #56 <https://github.com/zarr-developers/zarr-specs/issues/56>`_.
487414
415+
488416 Data types
489417==========
490418
@@ -509,6 +437,7 @@ defined in this protocol, the identifier is a simple ASCII
509437string. However, protocol extensions may use any JSON value to
510438identify a data type.
511439
440+
512441Core data types
513442---------------
514443
@@ -608,6 +537,7 @@ Core data types
608537 - variable, given by ``* ``, is limited to be a multiple of 8.
609538 - N/A
610539
540+
611541Floating point types correspond to basic binary interchange formats as
612542defined by IEEE 754-2008.
613543
@@ -619,6 +549,7 @@ should be understood as fall-back types of respectively 1, 2, and 3 byte length.
619549Zarr v3 is limited to type sizes that are a multiple of 8 bits but may support
620550other type sizes in later versions of this specification.
621551
552+
622553.. note ::
623554
624555 We are explicitely looking for more feedback and prototypes of code using the ``r* ``,
@@ -634,6 +565,7 @@ other type sizes in later versions of this specification.
634565 (pointer + length) to the variable size data we do not want to commit to such
635566 a structure.
636567
568+
637569Chunk grids
638570===========
639571
@@ -705,6 +637,7 @@ that chunk, where "%" is the modulo operator. For example, if a
705637is contained within the chunk at grid index (1, 7, 2) and has coordinates
706638(2, 10, 100) within that chunk.
707639
640+
708641The identifier for chunk with grid index (``i ``, ``j ``, ``k ``, ...) is
709642formed by joining together ASCII string representations of each index
710643using a separator and prefixed with the character ``c ``. The default value for
@@ -778,6 +711,7 @@ For example, for a two-dimensional array with chunk shape (`dx`,
778711elements in the order (0, 0), (1, 0), (2, 0), ..., (`dx ` - 3, `dy ` -
7797121), (`dx ` - 2, `dy ` - 1), (`dx ` - 1, `dy ` - 1).
780713
714+
781715Chunk encoding
782716==============
783717
@@ -829,6 +763,7 @@ community process specification.
829763Further details of how a compressor is configured for an array are
830764given in the section below on `Array metadata `_.
831765
766+
832767Metadata
833768========
834769
@@ -847,13 +782,15 @@ defined in [RFC8259]_. The term "array" is also used as defined in
847782name/value pairs. This section also defines how metadata documents are
848783encoded for storage.
849784
785+
850786Only the top level metadata document ``zarr.json `` is guaranteed to be of JSON
851787type, and can be used to define other formats for array-level and group-level
852788metadata documents. In the case where non-JSON metadata documents are used in a
853789Zarr hierarchy, the following sections on group and array level metadata are
854790non-normative; but other metadata formats are expected to define some
855791equivalence relations with the JSON documents.
856792
793+
857794Entry point metadata
858795--------------------
859796
@@ -956,6 +893,7 @@ ignored if not understood::
956893 ]
957894 }
958895
896+
959897Array metadata
960898--------------
961899
@@ -1061,6 +999,7 @@ following mandatory names:
1061999
10621000 See the top level metadata extension section for the time being.
10631001
1002+
10641003``attributes ``
10651004
10661005 The value must be an object. The object may contain any name/value
@@ -1079,6 +1018,7 @@ The following names are optional:
10791018 specification. When the ``compressor `` name is absent, this means that no
10801019 compressor is used.
10811020
1021+
10821022All other names within the array metadata object are reserved for
10831023future versions of this specification.
10841024
@@ -1149,6 +1089,7 @@ chunking as above, but using an extension data type::
11491089 ``filters `` have been removed,
11501090 ``zarr_format `` have been removed,
11511091
1092+
11521093Group metadata
11531094--------------
11541095
@@ -1179,6 +1120,8 @@ For example, the JSON document below defines an explicit group::
11791120
11801121 A group does not need a metadata document to exists, see implicit groups.
11811122
1123+
1124+
11821125Metadata encoding
11831126-----------------
11841127
@@ -1288,6 +1231,7 @@ operations ``set("foo", a)`` and ``set("FOO", b)`` will result in two
12881231separate (key, value) pairs being stored. Subsequently ``get("foo") ``
12891232will return *a * and ``get("FOO") `` will return *b *.
12901233
1234+
12911235Store implementations
12921236---------------------
12931237
@@ -1334,6 +1278,7 @@ create a store implementation spec and contribute it to the `zarr-specs GitHub r
13341278For an example of a store implementation spec, see the
13351279:ref: `file-system-store-v1 ` specification.
13361280
1281+
13371282Storage protocol
13381283================
13391284
@@ -1384,6 +1329,7 @@ If the root node is a group, the metadata key is
13841329is "meta/root.array.json", and the data keys are formed by
13851330concatenating "data/root/" and the chunk identifier.
13861331
1332+
13871333.. list-table :: Metadata Storage Key example
13881334 :header-rows: 1
13891335
@@ -1412,6 +1358,7 @@ concatenating "data/root/" and the chunk identifier.
14121358 - `/foo/baz `
14131359 - `meta/root/foo/baz.array.json `
14141360
1361+
14151362.. list-table :: Data Storage Key example
14161363 :header-rows: 1
14171364
@@ -1422,6 +1369,8 @@ concatenating "data/root/" and the chunk identifier.
14221369 - `(0, 0) `
14231370 - `data/root/foo/baz/c0/0 `
14241371
1372+
1373+
14251374Protocol operations
14261375-------------------
14271376
@@ -1474,7 +1423,7 @@ Let "+" be the string concatenation operator.
14741423**Retrieve element values in an array **
14751424
14761425 To retrieve element in an array at path `P ` and coordinate (`i `,
1477- `j `, ...), perform ``get(data_key(P, i, j, ...)) ``. The returned
1426+ `j `, ...), perform ``get(data_key(P, i, j, ...), value ) ``. The returned
14781427 value is the serialisation of the corresponding chunk, encoded
14791428 according to the array metadata stored at ``array_meta_key(P) ``.
14801429
@@ -1536,6 +1485,7 @@ Let "+" be the string concatenation operator.
15361485 deleted implicit groups, so a protocol implementation should make sure to
15371486 reify a parent group if they need to keep it. @@TODO clarify this
15381487
1488+
15391489**Determine if a node exists **
15401490
15411491 To determine if a node exists at path ``P ``, try in the following
@@ -1547,6 +1497,7 @@ Let "+" be the string concatenation operator.
15471497 .. note ::
15481498 For listable store, ``list_dir(parent(P)) `` can be an alternative.
15491499
1500+
15501501Protocol extensions
15511502===================
15521503
@@ -1562,7 +1513,6 @@ There are no group extensions as in Zarr v3.0.
15621513
15631514See https://github.com/zarr-developers/zarr-specs/issues/49 for a list of potential extensions
15641515
1565- <<<<<<< HEAD
15661516
15671517Comparison with Zarr v2
15681518=======================
@@ -1598,8 +1548,6 @@ Below is a summary of the key differences between this specification
15981548 data types will be defined via protocol extensions.
15991549
16001550
1601- =======
1602- >>>>>>> Some changes
16031551References
16041552==========
16051553
@@ -1611,6 +1559,7 @@ References
16111559 Requirement Levels. March 1997. Best Current Practice. URL:
16121560 https://tools.ietf.org/html/rfc2119
16131561
1562+
16141563 Change log
16151564==========
16161565
0 commit comments