Skip to content

Commit 7f0b238

Browse files
committed
apply feedback
1 parent c40725d commit 7f0b238

File tree

1 file changed

+14
-8
lines changed
  • docs/storage_transformers/sharding

1 file changed

+14
-8
lines changed

docs/storage_transformers/sharding/v1.0.rst

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -104,8 +104,11 @@ Sharding can be configured per array in the :ref:`array-metadata`:
104104
``[32, 2]``, ``[32, 4]``, ``[64, 2]`` or ``[96, 18]``.
105105

106106

107+
Storage transformer implementation
108+
==================================
109+
107110
Key & value transformation
108-
==========================
111+
--------------------------
109112

110113
The storage transformer protocol defines the abstract interface to be the same
111114
as the :ref:`abstract-store-interface`.
@@ -124,7 +127,7 @@ storage keys from a regular chunk grid which may use a customly configured
124127
For all entries that are part of the same shard the key is changed to the
125128
shard-key and the values are combined in the `Binary shard format`_ described
126129
below. The new shard-key is the chunk key divided by ``chunks_per_shard`` and
127-
floored per dimension. E.g. for ``chunks_per_shard=[32, 2]``, the chunk grid
130+
floored per dimension. For example for ``chunks_per_shard=[32, 2]``, the chunk grid
128131
position ``[96, 18]`` (e.g. key "data/root/foo/baz/c96/18") is transformed to
129132
the shard grid position ``[3, 9]`` and reassigned to the respective new key,
130133
honoring the original chunk separator (e.g. "data/root/foo/baz/c3/9").
@@ -133,16 +136,18 @@ also have the same shard grid position ``[3, 9]``.
133136

134137

135138
Binary shard format
136-
===================
139+
-------------------
137140

138141
The only binary format is the ``indexed`` format, as specified by the ``format``
139142
configuration key. Other binary formats might be added in future versions.
140143

141144
In the indexed binary format chunks are written successively in a shard, where
142145
unused space between them is allowed, followed by an index referencing them.
146+
The index is placed at the end of the file and has a length of 16 bytes per chunk
147+
in a shard, for example ``16 bytes * 64 = 1014 bytes`` for ``chunks_per_shard=[32, 2]``.
143148
The index holds an `offset, length` pair of little-endian uint64 per chunk,
144-
the chunks-order in the index is row-major (C) order, e.g. for (2, 2) chunks
145-
per shard an index would look like:
149+
the chunks-order in the index is row-major (C) order, for example for
150+
``chunks_per_shard=[2, 2]`` an index would look like:
146151

147152
.. code-block::
148153
@@ -151,7 +156,7 @@ per shard an index would look like:
151156
| uint64 | uint64 | uint64 | uint64 | uint64 | uint64 | uint64 | uint64 |
152157
153158
154-
Empty chunks are denoted by setting both offset and length to `2^64 - 1``.
159+
Empty chunks are denoted by setting both offset and length to ``2^64 - 1``.
155160
The index always has the full shape of all possible chunks per shard,
156161
even if they are outside of the array size.
157162

@@ -165,10 +170,11 @@ specific order of the existing chunks may be expected. Some writing strategies m
165170
leaving unused space up to an upper limit which might possibly be specified.
166171
Please note that for regular-sized uncompressed data all chunks have the same size and
167172
can therefore be replaced in-place.
168-
* **Append-only**: Any chunk to write is appended to the existing shard, followed by an updated index.
173+
* **Append-only**: Any chunk to write is appended to the existing shard,
174+
followed by an updated index.
169175

170176
Any configuration parameters for the write strategy must not be part of the metadata document,
171-
in a shard I'd propose to use Morton order, but this can easily be changed and customized, since any order can be read.
177+
they need to be configured at runtime, as this is implementation specific.
172178

173179

174180
References

0 commit comments

Comments
 (0)