@@ -27,8 +27,8 @@ License <https://creativecommons.org/licenses/by/3.0/>`_.
2727Abstract
2828========
2929
30- This specification defines an implementation of the Zarr abstract
31- storage transformer API introducing sharding.
30+ This specification defines an implementation of the Zarr
31+ storage transformer protocol for sharding.
3232
3333
3434Motivation
@@ -69,7 +69,7 @@ this specification are introduced with the words "for example".
6969Configuration
7070=============
7171
72- :ref: `array-metadata `.
72+ Sharding can be configured per array in the :ref: `array-metadata `:
7373
7474.. code-block ::
7575
@@ -87,11 +87,49 @@ Configuration
8787 ]
8888 }
8989
90+ ``format ``
9091
91- Sharding Mechanism
92- =========================
92+ Specifies a ` Binary shard format `_. In this version, the only binary format is the
93+ `` indexed `` format.
9394
94- @@TODO
95+ ``chunks_per_shard ``
96+
97+ An array of integers providing the number of chunks that are combined in a shard
98+ for each dimension of the Zarr array, where each chunk may only start at a position
99+ that is divisble by ``chunks_per_shard `` per dimension, e.g. starting at the zero-origin.
100+ The length of the array must match the length of the array metadata ``shape `` entry.
101+ For example, a value ``[32, 2] `` indicates that 64 chunks are combined in one shard,
102+ 32 along the first dimension, and for each of those 2 along the second dimension.
103+ Valid starting positions for a shard in the chunk-grid are therefore ``[0, 0] ``,
104+ ``[32, 2] ``, ``[32, 4] ``, ``[64, 2] `` or ``[96, 18] ``.
105+
106+
107+ Key & value transformation
108+ ==========================
109+
110+ The storage transformer protocol defines the abstract interface to be the same
111+ as the `Abstract store interface `_.
112+
113+ The Zarr store interface is defined in terms of `keys ` and `values `,
114+ where a `key ` is a sequence of characters and a `value ` is a sequence
115+ of bytes. A key-value pair is called entry in the following part.
116+
117+ This sharding transformer only adapts entries where the key starts
118+ with `data/root `, as they indicate data keys for array chunks. All other
119+ entries are simply passed on.
120+
121+ Entries starting with `data/root ` are grouped by their common shard, assuming
122+ `Storage keys ` from a regular chunk grid which may use a customly configured
123+ ``chunk separator ``:
124+ For all entries that are part of the same shard the key is changed to the
125+ shard-key and the values are combined in the `Binary shard format `_ described
126+ below. The new shard-key is the chunk key divided by ``chunks_per_shard `` and
127+ floored per dimension. E.g. for ``chunks_per_shard=[32, 2] ``, the chunk grid
128+ position ``[96, 18] `` (e.g. key "data/root/foo/baz/c96/18") is transformed to
129+ the shard grid position ``[3, 9] `` and reassigned to the respective new key,
130+ honoring the original chunk separator (e.g. "data/root/foo/baz/c3/9").
131+ Chunk grid positions ``[96, 19] ``, ``[97, 18] ``, …, up to ``[127, 19] `` will
132+ also have the same shard grid position ``[3, 9] ``.
95133
96134
97135Binary shard format
@@ -133,55 +171,6 @@ Any configuration parameters for the write strategy must not be part of the meta
133171in a shard I'd propose to use Morton order, but this can easily be changed and customized, since any order can be read.
134172
135173
136- Key translation
137- ===============
138-
139- The Zarr store interface is defined in terms of `keys ` and `values `,
140- where a `key ` is a sequence of characters and a `value ` is a sequence
141- of bytes.
142-
143- @@TODO
144-
145-
146- Store API implementation
147- ========================
148-
149- @@TODO
150-
151- The section below defines an implementation of the Zarr abstract store
152- interface (@@TODO link) in terms of the native operations of this
153- storage system. Below ``fspath_to_key() `` is a function that
154- translates file system paths to store keys, and ``key_to_fspath() `` is
155- a function that translates store keys to file system paths, as defined
156- in the section above.
157-
158- * ``get(key) -> value `` : Read and return the contents of the file at
159- file system path ``key_to_fspath(key) ``.
160-
161- * ``set(key, value) `` : Write ``value `` as the contents of the file at
162- file system path ``key_to_fspath(key) ``.
163-
164- * ``delete(key) `` : Delete the file or directory at file system path
165- ``key_to_fspath(key) ``.
166-
167- * ``list() `` : Recursively walk the file system from the base
168- directory, returning an iterator over keys obtained by calling
169- ``fspath_to_key(fp) `` for each descendant file path ``fp ``.
170-
171- * ``list_prefix(prefix) `` : Obtain a file system path by calling
172- ``key_to_fspath(prefix) ``. If the result is a directory path,
173- recursively walk the file system from this directory, returning an
174- iterator over keys obtained by calling ``fspath_to_key(fp) `` for
175- each descendant file path ``fp ``.
176-
177- * ``list_dir(prefix) `` : Obtain a file system path by calling
178- ``key_to_fspath(prefix) ``. If the result is a director path, list
179- the directory children. Return a set of keys obtained by calling
180- ``fspath_to_key(fp) `` for each child file path ``fp ``, and a set of
181- prefixes obtained by calling ``fspath_to_key(dp) `` for each child
182- directory path ``dp ``.
183-
184-
185174References
186175==========
187176
0 commit comments