@@ -105,15 +105,14 @@ Sharding can be configured per array in the :ref:`array-metadata` as follows::
105105
106106``chunk_shape ``
107107
108- An array of integers providing the shape of inner chunks in a shard for each
109- dimension of the Zarr array. The length of the array must match the length
110- of the array metadata ``shape `` entry. The each integer must by divisible by
111- the ``chunk_shape `` of the array as defined in the ``chunk_grid ``
112- :ref: `array-metadata `.
113- For example, an inner chunk shape of ``[32, 2] `` with an outer chunk shape
114- ``[64, 64] `` indicates that 64 chunks are combined in one shard, 2 along the
115- first dimension, and for each of those 32 along the second dimension.
116- Currently, only the ``regular `` chunk grid is supported.
108+ An array of integers specifying the size of the inner chunks in a shard
109+ along each dimension of the outer array. The length of the ``chunk_shape ``
110+ array must match the number of dimensions of the outer chunk to which this
111+ sharding codec is applied, and the chunk size along each dimension must
112+ evenly divide the size of the outer chunk. For example, an inner chunk
113+ shape of ``[32, 2] `` with an outer chunk shape ``[64, 64] `` indicates that
114+ 64 chunks are combined in one shard, 2 along the first dimension, and for
115+ each of those 32 along the second dimension.
117116
118117``codecs ``
119118
@@ -130,16 +129,38 @@ This is an ``array -> bytes`` codec.
130129
131130In the ``sharding_indexed `` binary format, chunks are written successively in a
132131shard, where unused space between them is allowed, followed by an index
133- referencing them. The index is placed at the end of the file and has a size of
134- 16 bytes multiplied by the number of chunks in a shard, for example
135- ``16 bytes * 4 = 1024 bytes `` for shard shape of ``[64, 64] `` and inner chunk
136- shape of ``[32, 32] ``. The index holds an `offset, nbytes ` pair of little-endian
137- uint64 per chunk, the chunks-order in the index is row-major (C) order. Given
138- the example of 2x2 inner chunks in a shard, the index would look like::
139-
140- | chunk (0, 0) | chunk (0, 1) | chunk (1, 0) | chunk (1, 1) |
141- | offset | nbytes | offset | nbytes | offset | nbytes | offset | nbytes |
142- | uint64 | uint64 | uint64 | uint64 | uint64 | uint64 | uint64 | uint64 |
132+ referencing them.
133+
134+ The index is placed at the end of the file and has a size of ``16 * n + 4 ``
135+ bytes, where ``n `` is the number of chunks in the shard, i.e. the product of the
136+ sizes specified in ``chunk_shape ``. For example, ``16 * 4 + 4 = 68 bytes `` for a
137+ shard shape of ``[64, 64] `` and inner chunk shape of ``[32, 32] ``.
138+
139+ The index format is:
140+
141+ - ``offset[0] : uint64le ``
142+ - ``nbytes[0] : uint64le ``
143+ - ``offset[1] : uint64le ``
144+ - ``nbytes[1] : uint64le ``
145+ - ...
146+ - ``offset[n-1] : uint64le ``
147+ - ``nbytes[n-1] : uint64le ``
148+ - ``checksum : uint32le ``
149+
150+ The final 4 bytes of the index is the CRC-32C checksum of the first ``16 * n ``
151+ bytes of the index (everything except the final checksum).
152+
153+ The chunks are listed in the index in row-major (C) order.
154+
155+ The ``offset[i] `` specifies the byte offset within the shard at which the
156+ encoded representation of chunk ``i `` begins, and ``nbytes[i] `` specifies the
157+ encoded length in bytes.
158+
159+ Given the example of 2x2 inner chunks in a shard, the index would look like::
160+
161+ | chunk (0, 0) | chunk (0, 1) | chunk (1, 0) | chunk (1, 1) | |
162+ | offset | nbytes | offset | nbytes | offset | nbytes | offset | nbytes | checksum |
163+ | uint64 | uint64 | uint64 | uint64 | uint64 | uint64 | uint64 | uint64 | uint32 |
143164
144165Empty chunks are denoted by setting both offset and nbytes to ``2^64 - 1 ``.
145166Empty chunks are interpreted as being filled with the fill value. The index
0 commit comments