@@ -104,8 +104,11 @@ Sharding can be configured per array in the :ref:`array-metadata`:
104104 ``[32, 2] ``, ``[32, 4] ``, ``[64, 2] `` or ``[96, 18] ``.
105105
106106
107+ Storage transformer implementation
108+ ==================================
109+
107110Key & value transformation
108- ==========================
111+ --------------------------
109112
110113The storage transformer protocol defines the abstract interface to be the same
111114as the :ref: `abstract-store-interface `.
@@ -124,7 +127,7 @@ storage keys from a regular chunk grid which may use a customly configured
124127For all entries that are part of the same shard the key is changed to the
125128shard-key and the values are combined in the `Binary shard format `_ described
126129below. The new shard-key is the chunk key divided by ``chunks_per_shard `` and
127- floored per dimension. E.g. for ``chunks_per_shard=[32, 2] ``, the chunk grid
130+ floored per dimension. For example for ``chunks_per_shard=[32, 2] ``, the chunk grid
128131position ``[96, 18] `` (e.g. key "data/root/foo/baz/c96/18") is transformed to
129132the shard grid position ``[3, 9] `` and reassigned to the respective new key,
130133honoring the original chunk separator (e.g. "data/root/foo/baz/c3/9").
@@ -133,16 +136,18 @@ also have the same shard grid position ``[3, 9]``.
133136
134137
135138Binary shard format
136- ===================
139+ -------------------
137140
138141The only binary format is the ``indexed `` format, as specified by the ``format ``
139142configuration key. Other binary formats might be added in future versions.
140143
141144In the indexed binary format chunks are written successively in a shard, where
142145unused space between them is allowed, followed by an index referencing them.
146+ The index is placed at the end of the file and has a length of 16 bytes per chunk
147+ in a shard, for example ``16 bytes * 64 = 1014 bytes `` for ``chunks_per_shard=[32, 2] ``.
143148The index holds an `offset, length ` pair of little-endian uint64 per chunk,
144- the chunks-order in the index is row-major (C) order, e.g. for (2, 2) chunks
145- per shard an index would look like:
149+ the chunks-order in the index is row-major (C) order, for example for
150+ `` chunks_per_shard=[2, 2] `` an index would look like:
146151
147152.. code-block ::
148153
@@ -151,7 +156,7 @@ per shard an index would look like:
151156 | uint64 | uint64 | uint64 | uint64 | uint64 | uint64 | uint64 | uint64 |
152157
153158
154- Empty chunks are denoted by setting both offset and length to `2^64 - 1``.
159+ Empty chunks are denoted by setting both offset and length to `` 2^64 - 1 ``.
155160The index always has the full shape of all possible chunks per shard,
156161even if they are outside of the array size.
157162
@@ -165,10 +170,11 @@ specific order of the existing chunks may be expected. Some writing strategies m
165170 leaving unused space up to an upper limit which might possibly be specified.
166171 Please note that for regular-sized uncompressed data all chunks have the same size and
167172 can therefore be replaced in-place.
168- * **Append-only **: Any chunk to write is appended to the existing shard, followed by an updated index.
173+ * **Append-only **: Any chunk to write is appended to the existing shard,
174+ followed by an updated index.
169175
170176Any configuration parameters for the write strategy must not be part of the metadata document,
171- in a shard I'd propose to use Morton order, but this can easily be changed and customized, since any order can be read .
177+ they need to be configured at runtime, as this is implementation specific .
172178
173179
174180References
0 commit comments