Skip to content

Commit 57bb277

Browse files
Carreaualimanfoo
andauthored
Start goal and Non goal section of spec v3. (#67)
* Start goal and Non goal section of spec v3. Thanks to the discussion with alimanfoo yesterday. I belove this will help further reader to understand where to look for differences and provide constructive feedback. * solicit feedback * fix typos Co-authored-by: Alistair Miles <[email protected]>
1 parent b73b940 commit 57bb277

File tree

2 files changed

+91
-7
lines changed

2 files changed

+91
-7
lines changed

docs/protocol/core/v3.0.rst

Lines changed: 82 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,8 @@ or by making a pull request against the
5252
This document was produced by the `Zarr core development team
5353
<https://github.com/orgs/zarr-developers/teams/core-devs>`_.
5454

55-
Goal of v3 spec and main difference with v2
56-
===========================================
55+
Main difference with v2
56+
=======================
5757

5858
Zarr spec v2 was originally designed around local filesystem, but Zarr has
5959
grown and is now regularly deployed on cloud / object storage. Those kind of
@@ -64,29 +64,104 @@ stores, in particular we want to achieve the following:
6464
- No assumption that the underlying store has locking ability.
6565
- Ability to do concurrent writes with the assumption that writes from clients will be consistent, but not atomic.
6666

67-
6867
Unlike Zarr spec v2, the spec v3 has mainly the following differences:
6968
- V3 is a flat key-value store instead of a hierarchical store. Hierarchy is implied.
7069
- V3 has an explicit root, while v2 roots and groups could not be distinguished.
7170
- Separation of the data and metadata key space.
7271
- Explicit support for extensions.
7372
- chunk separator is ``/`` by default.
73+
- `".json"` suffix for the metadata document by default.
7474

7575
This means that a store cannot be opened at an arbitrary point, but needs to be
7676
opened at the root. User facing convenience functions could walk a given
7777
hierarchy and return a sub-group, but this is not part of the API.
7878

79+
Goal and Non-Goal of v3 spec with respect to v2 spec
80+
====================================================
81+
82+
This section is informative and is present to help the reader familiar with
83+
previous version of zarr to find and understand the differences and the reasons
84+
behind them as well as guide the contributor during the draft and review
85+
period.
86+
87+
Better suitability for HPC file systems and network stores
88+
----------------------------------------------------------
89+
90+
One goal of the spec v3 is to have a design that minimized the number of
91+
round-trip operations that must done in order to understand the structure of a
92+
Zarr store. Especially on highly parallel file system and network stores
93+
listing keys and accessing metadata can be an expensive – high latency
94+
– operation. Thus a nested hierarchy listing all available groups, datasets
95+
and chunks can be a time consuming operation.
96+
97+
The v3 spec tries to separate the metadata, from group and dataset data
98+
using a prefix, as well as recommend a flatter way of storing keys in order to
99+
facilitate bulk operations. This should in particular allow to decrease the
100+
reliance on "metadata consolidation" seen with zarr v2.
101+
102+
Another related changes is the notion of implicit groups created when a dataset
103+
or chunk can be written via its full path even when the intermediate groups do
104+
not exist. This allow lock-free write operation for non-contending
105+
applications without the need for extra operations and round trip to create or
106+
check existence of intermediate groups.
107+
108+
Consideration of multiple programming languages
109+
-----------------------------------------------
110+
111+
Zarr spec v3 has an explicit goal of having better compatibility and easier
112+
implementation with programming languages other then Python. Thus a number of
113+
core features in previous spec have been relegated to extensions for the time
114+
being. This include in particular a reduction of the number of datatypes that
115+
are available in core.
116+
117+
Compatibility with the N5 project
118+
---------------------------------
119+
120+
The `N5 project <https://github.com/saalfeldlab/n5>`_ and Zarr have similar
121+
goals. One of the goal of Zarr Spec v3 is to provide compatibility for Most of
122+
Zarr v2 and N5 users in order to allow consolidation under the v3 spec with the
123+
end goal of merging the two projects.
124+
125+
Extensibility
126+
-------------
127+
128+
One of the Non-goal of Zarr Spec V3 is to cover all use cases in the core, and
129+
to provide a path forward for extensibility and future standardisation of
130+
extensions without the need to rely on the Zarr core team. A challenge is to
131+
make sure implementations of the Zarr protocol for which used extension are not
132+
available can still give user access to data without triggering corruption when
133+
possible.
134+
135+
79136
Questions that still need to be resolved
80137
----------------------------------------
81138

139+
We solicit feedback on the following area during the RFC period of this first
140+
draft.
141+
82142
- https://github.com/zarr-developers/zarr-specs/issues/72 to potentially split large metadata documents.
83143
- extensions and ``must_understand = True`` might be too restrictive. Work a draft implementation with extensions and
84144
see how far we can go. List of extensions to implement:
85145

86-
- Boolean
87-
- Complex
88-
- Datetime
89-
- Named dimensions
146+
- Boolean
147+
- Complex
148+
- Datetime
149+
- Named dimensions
150+
- Awkward arrays
151+
152+
See https://github.com/zarr-developers/zarr-specs/issues/89 for discussion on
153+
the topic.
154+
155+
- Node name case sensitivity: The node name is now case sensitive, this may
156+
make store implementation more complicated as backed might not be (like some
157+
specific filesystem / object store), and we may want to recommend a standard
158+
escaping mechanism in those case. https://github.com/zarr-developers/zarr-specs/issues/57
159+
160+
- Node name character set: Same as above but unlike the previous point where we
161+
solicit feedback on wither store implementation should support full unicode.
162+
https://github.com/zarr-developers/zarr-specs/issues/56
163+
164+
- Should named dimensions be part of the core metadata spec ? https://github.com/zarr-developers/zarr-specs/issues/73
90165

91166

92167
Document conventions

docs/stores/filesystem/v1.0.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,15 @@ This document was produced by the `Zarr core development team
4646
<https://github.com/orgs/zarr-developers/teams/core-devs>`_.
4747

4848

49+
Notes about design decisions for the native File System Store
50+
=============================================================
51+
52+
The original file system store is designed for simplicity and easy manipulation
53+
and transfer by external tools not aware of the store structure. In particular
54+
tools like ``gsutil`` can be use to transfer a local directory store to cloud
55+
base storage, hence the keys choices will be conserved.
56+
57+
4958
Document conventions
5059
====================
5160

0 commit comments

Comments
 (0)