Skip to content

Commit f591fe5

Browse files
committed
Rebase
1 parent e1a0806 commit f591fe5

File tree

1 file changed

+45
-96
lines changed

1 file changed

+45
-96
lines changed

docs/protocol/core/v3.0.rst

Lines changed: 45 additions & 96 deletions
Original file line numberDiff line numberDiff line change
@@ -23,12 +23,14 @@ is licensed under a `Creative Commons Attribution 3.0 Unported License
2323

2424
----
2525

26+
2627
Abstract
2728
========
2829

2930
This specification defines the Zarr core protocol for storage and
3031
retrieval of N-dimensional typed arrays.
3132

33+
3234
Status of this document
3335
=======================
3436

@@ -43,7 +45,6 @@ This document was produced by the `Zarr core development team
4345
<https://github.com/orgs/zarr-developers/teams/core-devs>`_.
4446

4547

46-
<<<<<<< HEAD
4748
Introduction
4849
============
4950

@@ -120,80 +121,10 @@ languages. Additional functionality can then be layered via
120121
extensions, some of which may aim for wide adoption, some of which may
121122
be more specialised and have more limited implementation.
122123

123-
=======
124-
Zarr spec v2 was originally designed around local filesystem, but Zarr has
125-
grown and is now regularly deployed on cloud / object storage. Those kind of
126-
storage have characteristics, capabilities and usage patterns that can widely
127-
differ from the assumptions of spec v2. V3 is designed to consider online
128-
stores, in particular we want to achieve the following:
129-
130-
- No assumption that the underlying store has locking ability.
131-
- Ability to do concurrent writes with the assumption that writes from clients will be consistent, but not atomic.
132-
133-
Unlike Zarr spec v2, the spec v3 has mainly the following differences:
134-
135-
- V3 is a flat key-value store instead of a hierarchical store. Hierarchy is implied.
136-
- V3 has an explicit root, while v2 roots and groups could not be distinguished.
137-
- Separation of the data and metadata key space.
138-
- Explicit support for extensions.
139-
- chunk separator is ``/`` by default.
140-
- `".json"` suffix for the metadata document by default.
141-
142-
This means that a store cannot be opened at an arbitrary point, but needs to be
143-
opened at the root. User facing convenience functions could walk a given
144-
hierarchy and return a sub-group, but this is not part of the API.
145-
146-
Goal and non-goal of v3 spec with respect to v2 spec
147-
====================================================
148-
149-
This section is informative and is present to help the reader familiar with
150-
previous version of zarr to find and understand the differences and the reasons
151-
behind them as well as guide the contributor during the draft and review
152-
period.
153-
154-
Better suitability for HPC file systems and network stores
155-
----------------------------------------------------------
156-
157-
One goal of the spec v3 is to have a design that minimizes the number of
158-
round-trip operations that must be done in order to understand the structure of
159-
a Zarr store. Especially on highly parallel file systems and network stores,
160-
listing keys and accessing metadata can be an expensive – high latency
161-
– operation. Thus a nested hierarchy listing all available groups, datasets
162-
and chunks can be a time consuming operation.
163-
164-
The v3 spec tries to separate the metadata from group and dataset data
165-
using a prefix, as well as recommend a flatter way of storing keys in order to
166-
facilitate bulk operations. This should in particular allow to decrease the
167-
reliance on "metadata consolidation" seen with Zarr v2.
168-
169-
Another related change is the notion of implicit groups created when a dataset
170-
or chunk can be written via its full path even when the intermediate groups do
171-
not exist. This allows lock-free write operations for non-contending
172-
applications without the need for extra operations and round trips to create or
173-
check the existence of intermediate groups.
174-
175-
Consideration of multiple programming languages
176-
-----------------------------------------------
177-
178-
Zarr spec v3 has an explicit goal of having better compatibility and easier
179-
implementation with programming languages other then Python. Thus a number of
180-
core features in previous spec have been relegated to extensions for the time
181-
being. This includes in particular a reduction of the number of data types that
182-
are available in core.
183-
184-
Compatibility with the N5 project
185-
---------------------------------
186-
187-
The `N5 project <https://github.com/saalfeldlab/n5>`_ and Zarr have similar
188-
goals. One of the goal of Zarr spec v3 is to provide compatibility for most of
189-
Zarr v2 and N5 users in order to allow consolidation under the v3 spec with the
190-
end goal of merging the two projects.
191-
>>>>>>> Some changes
192124

193125
Extensibility
194126
-------------
195127

196-
<<<<<<< HEAD
197128
The development of systems for storage of very large array-like data
198129
is a very active area of research and development, and there are many
199130
possibilities that remain to be explored. A goal of this specification
@@ -205,14 +136,7 @@ where possible, in order to retain interoperability. We also aim to
205136
provide a framework for community-defined extensions, which can be
206137
developed and published independently without requiring centralised
207138
coordination of all specifications.
208-
=======
209-
One of the non-goal of Zarr spec v3 is to cover all use cases in the core, and
210-
to provide a path forward for extensibility and future standardisation of
211-
extensions without the need to rely on the Zarr core team. A challenge is to
212-
make sure implementations of the Zarr protocol for which used extensions are not
213-
available can still give user access to data without triggering corruption when
214-
possible.
215-
>>>>>>> Some changes
139+
216140

217141
Questions that still need to be resolved
218142
----------------------------------------
@@ -223,28 +147,29 @@ draft.
223147
- https://github.com/zarr-developers/zarr-specs/issues/72 to potentially split
224148
large metadata documents.
225149
- extensions and ``must_understand = True`` might be too restrictive. Work a
226-
draft implementation with extensions and see how far we can go. List of
227-
extensions to implement:
150+
draft implementation with extensions and
151+
see how far we can go. List of extensions to implement:
228152

229153
- Boolean
230154
- Complex
231155
- Datetime
232156
- Named dimensions
233157
- Awkward arrays
234158

235-
See https://github.com/zarr-developers/zarr-specs/issues/89 for a discussion
236-
on the topic.
159+
See https://github.com/zarr-developers/zarr-specs/issues/89 for discussion on
160+
the topic.
161+
162+
- Node name case sensitivity: The node name is now case sensitive, this may
163+
make store implementation more complicated as backed might not be (like some
164+
specific filesystem / object store), and we may want to recommend a standard
165+
escaping mechanism in those case. https://github.com/zarr-developers/zarr-specs/issues/57
237166

238-
- Node name case sensitivity: the node name is now case sensitive, this may
239-
make store implementation more complicated as backed might not be (like some
240-
specific filesystem / object store), and we may want to recommend a standard
241-
escaping mechanism in those cases. https://github.com/zarr-developers/zarr-specs/issues/57
167+
- Node name character set: Same as above but unlike the previous point where we
168+
solicit feedback on wither store implementation should support full unicode.
169+
https://github.com/zarr-developers/zarr-specs/issues/56
242170

243-
- Node name character set: same as above but unlike the previous point where we
244-
solicit feedback on whether store implementation should support full unicode.
245-
https://github.com/zarr-developers/zarr-specs/issues/56
171+
- Should named dimensions be part of the core metadata spec ? https://github.com/zarr-developers/zarr-specs/issues/73
246172

247-
- Should named dimensions be part of the core metadata spec ? https://github.com/zarr-developers/zarr-specs/issues/73
248173

249174
Document conventions
250175
====================
@@ -261,6 +186,7 @@ All of the text of this specification is normative except sections
261186
explicitly marked as non-normative, examples, and notes. Examples in
262187
this specification are introduced with the words "for example".
263188

189+
264190
Concepts and terminology
265191
========================
266192

@@ -351,7 +277,7 @@ conceptual model underpinning the Zarr protocol.
351277

352278
An array_ contains zero or more elements. Each element can be
353279
identified by a tuple of integer coordinates, one for each
354-
dimension_ of the array_. If all dimensions_ of an array_ have a
280+
dimension_ of the array_. If all dimensions_ of an array_ have
355281
finite length, then the number of elements in the array_ is given
356282
by the product of the dimension_ lengths. An array_ may not have
357283
been fully initialized.
@@ -456,6 +382,7 @@ conceptual model underpinning the Zarr protocol.
456382
interface`_ which is a common set of operations that stores may
457383
provide.
458384

385+
459386
Node names
460387
==========
461388

@@ -485,6 +412,7 @@ identical.
485412
this would entail. If you have experience or views please comment on
486413
`issue #56 <https://github.com/zarr-developers/zarr-specs/issues/56>`_.
487414
415+
488416
Data types
489417
==========
490418

@@ -509,6 +437,7 @@ defined in this protocol, the identifier is a simple ASCII
509437
string. However, protocol extensions may use any JSON value to
510438
identify a data type.
511439

440+
512441
Core data types
513442
---------------
514443

@@ -608,6 +537,7 @@ Core data types
608537
- variable, given by ``*``, is limited to be a multiple of 8.
609538
- N/A
610539

540+
611541
Floating point types correspond to basic binary interchange formats as
612542
defined by IEEE 754-2008.
613543

@@ -619,6 +549,7 @@ should be understood as fall-back types of respectively 1, 2, and 3 byte length.
619549
Zarr v3 is limited to type sizes that are a multiple of 8 bits but may support
620550
other type sizes in later versions of this specification.
621551

552+
622553
.. note::
623554

624555
We are explicitely looking for more feedback and prototypes of code using the ``r*``,
@@ -634,6 +565,7 @@ other type sizes in later versions of this specification.
634565
(pointer + length) to the variable size data we do not want to commit to such
635566
a structure.
636567

568+
637569
Chunk grids
638570
===========
639571

@@ -705,6 +637,7 @@ that chunk, where "%" is the modulo operator. For example, if a
705637
is contained within the chunk at grid index (1, 7, 2) and has coordinates
706638
(2, 10, 100) within that chunk.
707639

640+
708641
The identifier for chunk with grid index (``i``, ``j``, ``k``, ...) is
709642
formed by joining together ASCII string representations of each index
710643
using a separator and prefixed with the character ``c``. The default value for
@@ -778,6 +711,7 @@ For example, for a two-dimensional array with chunk shape (`dx`,
778711
elements in the order (0, 0), (1, 0), (2, 0), ..., (`dx` - 3, `dy` -
779712
1), (`dx` - 2, `dy` - 1), (`dx` - 1, `dy` - 1).
780713

714+
781715
Chunk encoding
782716
==============
783717

@@ -829,6 +763,7 @@ community process specification.
829763
Further details of how a compressor is configured for an array are
830764
given in the section below on `Array metadata`_.
831765

766+
832767
Metadata
833768
========
834769

@@ -847,13 +782,15 @@ defined in [RFC8259]_. The term "array" is also used as defined in
847782
name/value pairs. This section also defines how metadata documents are
848783
encoded for storage.
849784

785+
850786
Only the top level metadata document ``zarr.json`` is guaranteed to be of JSON
851787
type, and can be used to define other formats for array-level and group-level
852788
metadata documents. In the case where non-JSON metadata documents are used in a
853789
Zarr hierarchy, the following sections on group and array level metadata are
854790
non-normative; but other metadata formats are expected to define some
855791
equivalence relations with the JSON documents.
856792

793+
857794
Entry point metadata
858795
--------------------
859796

@@ -956,6 +893,7 @@ ignored if not understood::
956893
]
957894
}
958895

896+
959897
Array metadata
960898
--------------
961899

@@ -1061,6 +999,7 @@ following mandatory names:
1061999

10621000
See the top level metadata extension section for the time being.
10631001

1002+
10641003
``attributes``
10651004

10661005
The value must be an object. The object may contain any name/value
@@ -1079,6 +1018,7 @@ The following names are optional:
10791018
specification. When the ``compressor`` name is absent, this means that no
10801019
compressor is used.
10811020

1021+
10821022
All other names within the array metadata object are reserved for
10831023
future versions of this specification.
10841024

@@ -1149,6 +1089,7 @@ chunking as above, but using an extension data type::
11491089
``filters`` have been removed,
11501090
``zarr_format`` have been removed,
11511091

1092+
11521093
Group metadata
11531094
--------------
11541095

@@ -1179,6 +1120,8 @@ For example, the JSON document below defines an explicit group::
11791120

11801121
A group does not need a metadata document to exists, see implicit groups.
11811122

1123+
1124+
11821125
Metadata encoding
11831126
-----------------
11841127

@@ -1288,6 +1231,7 @@ operations ``set("foo", a)`` and ``set("FOO", b)`` will result in two
12881231
separate (key, value) pairs being stored. Subsequently ``get("foo")``
12891232
will return *a* and ``get("FOO")`` will return *b*.
12901233

1234+
12911235
Store implementations
12921236
---------------------
12931237

@@ -1334,6 +1278,7 @@ create a store implementation spec and contribute it to the `zarr-specs GitHub r
13341278
For an example of a store implementation spec, see the
13351279
:ref:`file-system-store-v1` specification.
13361280

1281+
13371282
Storage protocol
13381283
================
13391284

@@ -1384,6 +1329,7 @@ If the root node is a group, the metadata key is
13841329
is "meta/root.array.json", and the data keys are formed by
13851330
concatenating "data/root/" and the chunk identifier.
13861331

1332+
13871333
.. list-table:: Metadata Storage Key example
13881334
:header-rows: 1
13891335

@@ -1412,6 +1358,7 @@ concatenating "data/root/" and the chunk identifier.
14121358
- `/foo/baz`
14131359
- `meta/root/foo/baz.array.json`
14141360

1361+
14151362
.. list-table:: Data Storage Key example
14161363
:header-rows: 1
14171364

@@ -1422,6 +1369,8 @@ concatenating "data/root/" and the chunk identifier.
14221369
- `(0, 0)`
14231370
- `data/root/foo/baz/c0/0`
14241371

1372+
1373+
14251374
Protocol operations
14261375
-------------------
14271376

@@ -1474,7 +1423,7 @@ Let "+" be the string concatenation operator.
14741423
**Retrieve element values in an array**
14751424

14761425
To retrieve element in an array at path `P` and coordinate (`i`,
1477-
`j`, ...), perform ``get(data_key(P, i, j, ...))``. The returned
1426+
`j`, ...), perform ``get(data_key(P, i, j, ...), value)``. The returned
14781427
value is the serialisation of the corresponding chunk, encoded
14791428
according to the array metadata stored at ``array_meta_key(P)``.
14801429

@@ -1536,6 +1485,7 @@ Let "+" be the string concatenation operator.
15361485
deleted implicit groups, so a protocol implementation should make sure to
15371486
reify a parent group if they need to keep it. @@TODO clarify this
15381487

1488+
15391489
**Determine if a node exists**
15401490

15411491
To determine if a node exists at path ``P``, try in the following
@@ -1547,6 +1497,7 @@ Let "+" be the string concatenation operator.
15471497
.. note::
15481498
For listable store, ``list_dir(parent(P))`` can be an alternative.
15491499

1500+
15501501
Protocol extensions
15511502
===================
15521503

@@ -1562,7 +1513,6 @@ There are no group extensions as in Zarr v3.0.
15621513

15631514
See https://github.com/zarr-developers/zarr-specs/issues/49 for a list of potential extensions
15641515

1565-
<<<<<<< HEAD
15661516

15671517
Comparison with Zarr v2
15681518
=======================
@@ -1598,8 +1548,6 @@ Below is a summary of the key differences between this specification
15981548
data types will be defined via protocol extensions.
15991549

16001550

1601-
=======
1602-
>>>>>>> Some changes
16031551
References
16041552
==========
16051553

@@ -1611,6 +1559,7 @@ References
16111559
Requirement Levels. March 1997. Best Current Practice. URL:
16121560
https://tools.ietf.org/html/rfc2119
16131561
1562+
16141563
Change log
16151564
==========
16161565

0 commit comments

Comments
 (0)