@@ -52,8 +52,8 @@ or by making a pull request against the
5252This document was produced by the `Zarr core development team
5353<https://github.com/orgs/zarr-developers/teams/core-devs> `_.
5454
55- Goal of v3 spec and main difference with v2
56- ===========================================
55+ Main difference with v2
56+ =======================
5757
5858Zarr spec v2 was originally designed around local filesystem, but Zarr has
5959grown and is now regularly deployed on cloud / object storage. Those kind of
@@ -64,29 +64,104 @@ stores, in particular we want to achieve the following:
6464 - No assumption that the underlying store has locking ability.
6565 - Ability to do concurrent writes with the assumption that writes from clients will be consistent, but not atomic.
6666
67-
6867Unlike Zarr spec v2, the spec v3 has mainly the following differences:
6968 - V3 is a flat key-value store instead of a hierarchical store. Hierarchy is implied.
7069 - V3 has an explicit root, while v2 roots and groups could not be distinguished.
7170 - Separation of the data and metadata key space.
7271 - Explicit support for extensions.
7372 - chunk separator is ``/ `` by default.
73+ - `".json" ` suffix for the metadata document by default.
7474
7575This means that a store cannot be opened at an arbitrary point, but needs to be
7676opened at the root. User facing convenience functions could walk a given
7777hierarchy and return a sub-group, but this is not part of the API.
7878
79+ Goal and Non-Goal of v3 spec with respect to v2 spec
80+ ====================================================
81+
82+ This section is informative and is present to help the reader familiar with
83+ previous version of zarr to find and understand the differences and the reasons
84+ behind them as well as guide the contributor during the draft and review
85+ period.
86+
87+ Better suitability for HPC file systems and network stores
88+ ----------------------------------------------------------
89+
90+ One goal of the spec v3 is to have a design that minimized the number of
91+ round-trip operations that must done in order to understand the structure of a
92+ Zarr store. Especially on highly parallel file system and network stores
93+ listing keys and accessing metadata can be an expensive – high latency
94+ – operation. Thus a nested hierarchy listing all available groups, datasets
95+ and chunks can be a time consuming operation.
96+
97+ The v3 spec tries to separate the metadata, from group and dataset data
98+ using a prefix, as well as recommend a flatter way of storing keys in order to
99+ facilitate bulk operations. This should in particular allow to decrease the
100+ reliance on "metadata consolidation" seen with zarr v2.
101+
102+ Another related changes is the notion of implicit groups created when a dataset
103+ or chunk can be written via its full path even when the intermediate groups do
104+ not exist. This allow lock-free write operation for non-contending
105+ applications without the need for extra operations and round trip to create or
106+ check existence of intermediate groups.
107+
108+ Consideration of multiple programming languages
109+ -----------------------------------------------
110+
111+ Zarr spec v3 has an explicit goal of having better compatibility and easier
112+ implementation with programming languages other then Python. Thus a number of
113+ core features in previous spec have been relegated to extensions for the time
114+ being. This include in particular a reduction of the number of datatypes that
115+ are available in core.
116+
117+ Compatibility with the N5 project
118+ ---------------------------------
119+
120+ The `N5 project <https://github.com/saalfeldlab/n5 >`_ and Zarr have similar
121+ goals. One of the goal of Zarr Spec v3 is to provide compatibility for Most of
122+ Zarr v2 and N5 users in order to allow consolidation under the v3 spec with the
123+ end goal of merging the two projects.
124+
125+ Extensibility
126+ -------------
127+
128+ One of the Non-goal of Zarr Spec V3 is to cover all use cases in the core, and
129+ to provide a path forward for extensibility and future standardisation of
130+ extensions without the need to rely on the Zarr core team. A challenge is to
131+ make sure implementations of the Zarr protocol for which used extension are not
132+ available can still give user access to data without triggering corruption when
133+ possible.
134+
135+
79136Questions that still need to be resolved
80137----------------------------------------
81138
139+ We solicit feedback on the following area during the RFC period of this first
140+ draft.
141+
82142 - https://github.com/zarr-developers/zarr-specs/issues/72 to potentially split large metadata documents.
83143 - extensions and ``must_understand = True `` might be too restrictive. Work a draft implementation with extensions and
84144 see how far we can go. List of extensions to implement:
85145
86- - Boolean
87- - Complex
88- - Datetime
89- - Named dimensions
146+ - Boolean
147+ - Complex
148+ - Datetime
149+ - Named dimensions
150+ - Awkward arrays
151+
152+ See https://github.com/zarr-developers/zarr-specs/issues/89 for discussion on
153+ the topic.
154+
155+ - Node name case sensitivity: The node name is now case sensitive, this may
156+ make store implementation more complicated as backed might not be (like some
157+ specific filesystem / object store), and we may want to recommend a standard
158+ escaping mechanism in those case. https://github.com/zarr-developers/zarr-specs/issues/57
159+
160+ - Node name character set: Same as above but unlike the previous point where we
161+ solicit feedback on wither store implementation should support full unicode.
162+ https://github.com/zarr-developers/zarr-specs/issues/56
163+
164+ - Should named dimensions be part of the core metadata spec ? https://github.com/zarr-developers/zarr-specs/issues/73
90165
91166
92167Document conventions
0 commit comments