Skip to content

Commit 36b4832

Browse files
authored
Merge pull request #253 from scalableminds/sharding-index-codecs
Adds index_codecs to the sharding codec
2 parents 46ed6cd + 5924fc1 commit 36b4832

File tree

3 files changed

+203
-56
lines changed

3 files changed

+203
-56
lines changed

docs/v3/codecs/crc32c/v1.0.rst

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
.. _crc32c-codec-v1:
2+
3+
============================
4+
CRC32C checksum codec (version 1.0)
5+
============================
6+
7+
**Editor's draft 17 July 2023**
8+
9+
Specification URI:
10+
https://zarr-specs.readthedocs.io/en/latest/v3/codecs/crc32c/v1.0.html
11+
Corresponding ZEP:
12+
`ZEP 2 — Sharding codec <https://zarr.dev/zeps/draft/ZEP0002.html>`_
13+
Issue tracking:
14+
`GitHub issues <https://github.com/zarr-developers/zarr-specs/labels/codec>`_
15+
Suggest an edit for this spec:
16+
`GitHub editor <https://github.com/zarr-developers/zarr-specs/blob/main/docs/v3/codecs/crc32c/v1.0.rst>`_
17+
18+
Copyright 2022-Present `Zarr core development team
19+
<https://github.com/orgs/zarr-developers/teams/core-devs>`_. This work
20+
is licensed under a `Creative Commons Attribution 3.0 Unported License
21+
<https://creativecommons.org/licenses/by/3.0/>`_.
22+
23+
----
24+
25+
26+
Abstract
27+
========
28+
29+
Defines an ``bytes -> bytes`` codec that appends a CRC32C checksum of the input bytestream.
30+
31+
32+
Status of this document
33+
=======================
34+
35+
.. warning::
36+
This document is a draft for review and subject to changes.
37+
It will become final when the `Zarr Enhancement Proposal (ZEP) 2 <https://zarr.dev/zeps/draft/ZEP0002.html>`_
38+
is approved via the `ZEP process <https://zarr.dev/zeps/active/ZEP0000.html>`_.
39+
40+
41+
Document conventions
42+
====================
43+
44+
Conformance requirements are expressed with a combination of
45+
descriptive assertions and [RFC2119]_ terminology. The key words
46+
"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
47+
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
48+
parts of this document are to be interpreted as described in
49+
[RFC2119]_. However, for readability, these words do not appear in all
50+
uppercase letters in this specification.
51+
52+
All of the text of this specification is normative except sections
53+
explicitly marked as non-normative, examples, and notes. Examples in
54+
this specification are introduced with the words "for example".
55+
56+
57+
Codec name
58+
==========
59+
60+
The value of the ``name`` member in the codec object MUST be ``crc32c``.
61+
62+
63+
Configuration parameters
64+
========================
65+
66+
None.
67+
68+
69+
Format and algorithm
70+
====================
71+
72+
This is a ``bytes -> bytes`` codec.
73+
74+
The codec computes the CRC32C checksum as defined in [RFC3720]_ of the input
75+
bytestream. The output bytestream is composed of the unchanged input byte
76+
stream with the appended checksum. The checksum is represented as a 32-bit
77+
unsigned integer represented in little endian.
78+
79+
80+
References
81+
==========
82+
83+
.. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate
84+
Requirement Levels. March 1997. Best Current Practice. URL:
85+
https://tools.ietf.org/html/rfc2119
86+
87+
.. [RFC3720] J. Satran et al. Internet Small Computer Systems
88+
Interface (iSCSI). April 2004. Proposed Standard. URL:
89+
https://tools.ietf.org/html/rfc3720
90+
91+
92+
Change log
93+
==========
94+
95+
No changes yet.

docs/v3/codecs/sharding-indexed/v1.0.rst

Lines changed: 94 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,15 @@
33
==========================================
44
Sharding codec (version 1.0)
55
==========================================
6-
-----------------------------
7-
Editor's draft 23 03 2023
8-
-----------------------------
6+
7+
**Editor's draft 17 July 2023**
98

109
Specification URI:
1110
https://zarr-specs.readthedocs.io/en/latest/v3/codecs/sharding-indexed/v1.0.html
12-
11+
Corresponding ZEP:
12+
`ZEP 2 — Sharding codec <https://zarr.dev/zeps/draft/ZEP0002.html>`_
1313
Issue tracking:
1414
`GitHub issues <https://github.com/zarr-developers/zarr-specs/labels/sharding-indexed-codec-v1.0>`-
15-
1615
Suggest an edit for this spec:
1716
`GitHub editor <https://github.com/zarr-developers/zarr-specs/blob/main/docs/codecs/sharding-indexed/v1.0.rst>`_
1817

@@ -91,12 +90,27 @@ Sharding can be configured per array in the :ref:`array-metadata` as follows::
9190
"configuration": {
9291
"chunk_shape": [32, 32],
9392
"codecs": [
93+
{
94+
"name": "endian",
95+
"configuration": {
96+
"endian": "little",
97+
}
98+
},
9499
{
95100
"name": "gzip",
96101
"configuration": {
97102
"level": 1
98103
}
99104
}
105+
],
106+
"index_codecs": [
107+
{
108+
"name": "endian",
109+
"configuration": {
110+
"endian": "little",
111+
}
112+
},
113+
{ "name": "crc32c"}
100114
]
101115
}
102116
}
@@ -105,85 +119,104 @@ Sharding can be configured per array in the :ref:`array-metadata` as follows::
105119

106120
``chunk_shape``
107121

108-
An array of integers specifying the size of the inner chunks in a shard
109-
along each dimension of the outer array. The length of the ``chunk_shape``
110-
array must match the number of dimensions of the outer chunk to which this
111-
sharding codec is applied, and the chunk size along each dimension must
112-
evenly divide the size of the outer chunk. For example, an inner chunk
113-
shape of ``[32, 2]`` with an outer chunk shape ``[64, 64]`` indicates that
114-
64 chunks are combined in one shard, 2 along the first dimension, and for
122+
An array of integers specifying the shape of the inner chunks in a shard
123+
along each dimension of the outer array. The length of the ``chunk_shape``
124+
array must match the number of dimensions of the shard shape to which this
125+
sharding codec is applied, and the inner chunk shape along each dimension must
126+
evenly divide the size of the shard shape. For example, an inner chunk
127+
shape of ``[32, 2]`` with an shard shape ``[64, 64]`` indicates that
128+
64 inner chunks are combined in one shard, 2 along the first dimension, and for
115129
each of those 32 along the second dimension.
116130

117131
``codecs``
118132

119133
Specifies a list of codecs to be used for encoding and decoding inner chunks.
120134
The value must be an array of objects, as specified in the
121-
:ref:`array-metadata`. An absent ``codecs`` member is equivalent to
122-
specifying an empty list of codecs.
135+
:ref:`array-metadata`. The ``codecs`` member is required and needs to contain
136+
exactly one ``array -> bytes`` codec.
137+
138+
``index_codecs``
139+
140+
Specifies a list of codecs to be used for encoding and decoding shard index.
141+
The value must be an array of objects, as specified in the
142+
:ref:`array-metadata`. The ``index_codecs`` member is required and needs to
143+
contain exactly one ``array -> bytes`` codec. Codecs that produce
144+
variable-sized encoded representation, such as compression codecs, MUST NOT
145+
be used for index codecs. It is RECOMMENDED to use a little-endian codec
146+
followed by a crc32c checksum as index codecs.
147+
148+
Definitions
149+
===========
150+
151+
* **Shard** is a chunk of the outer array that corresponds to one storage object.
152+
As described in this document, shards MAY have multiple inner chunks.
153+
* **Inner chunk** is a chunk within the shard.
154+
* **Shard shape** is the chunk shape of the outer array.
155+
* **Inner chunk shape** is defined by the ``chunk_shape`` configuration of the codec.
156+
The inner chunk shape needs to have the same dimensions as the shard shape and the
157+
inner chunk shape along each dimension must evenly divide the size of the shard shape.
158+
* **Chunks per shard** is the element-wise division of the shard shape by the
159+
inner chunk shape.
123160

124161

125162
Binary shard format
126163
===================
127164

128165
This is an ``array -> bytes`` codec.
129166

130-
In the ``sharding_indexed`` binary format, chunks are written successively in a
167+
In the ``sharding_indexed`` binary format, inner chunks are written successively in a
131168
shard, where unused space between them is allowed, followed by an index
132169
referencing them.
133170

134-
The index is placed at the end of the file and has a size of ``16 * n + 4``
135-
bytes, where ``n`` is the number of chunks in the shard, i.e. the product of the
136-
sizes specified in ``chunk_shape``. For example, ``16 * 4 + 4 = 68 bytes`` for a
137-
shard shape of ``[64, 64]`` and inner chunk shape of ``[32, 32]``.
138-
139-
The index format is:
140-
141-
- ``offset[0] : uint64le``
142-
- ``nbytes[0] : uint64le``
143-
- ``offset[1] : uint64le``
144-
- ``nbytes[1] : uint64le``
145-
- ...
146-
- ``offset[n-1] : uint64le``
147-
- ``nbytes[n-1] : uint64le``
148-
- ``checksum : uint32le``
149-
150-
The final 4 bytes of the index is the CRC-32C checksum of the first ``16 * n``
151-
bytes of the index (everything except the final checksum).
152-
153-
The chunks are listed in the index in row-major (C) order.
171+
The index is an array with 64-bit unsigned integers with a shape that matches the
172+
chunks per shard tuple with an appended dimension of size 2.
173+
For example, given a shard shape of ``[128, 128]`` and chunk shape of ``[32, 32]``,
174+
there are ``[4, 4]`` inner chunks in a shard. The corresponding shard index has a
175+
shape of ``[4, 4, 2]``.
154176

177+
The index contains the ``offset`` and ``nbytes`` values for each inner chunk.
155178
The ``offset[i]`` specifies the byte offset within the shard at which the
156179
encoded representation of chunk ``i`` begins, and ``nbytes[i]`` specifies the
157180
encoded length in bytes.
158181

159-
Given the example of 2x2 inner chunks in a shard, the index would look like::
182+
Empty inner chunks are denoted by setting both offset and nbytes to ``2^64 - 1``.
183+
Empty inner chunks are interpreted as being filled with the fill value. The index
184+
always has the full shape of all possible inner chunks per shard, even if they extend
185+
beyond the array shape.
186+
187+
The index is placed at the end of the file and encoded into binary representations
188+
using the specified index codecs. The byte size of the index is determined by the
189+
number of inner chunks in the shard ``n``, i.e. the product of chunks per shard, and
190+
the choice of index codecs.
191+
192+
For an example, consider a shard shape of ``[64, 64]``, an inner chunk shape of
193+
``[32, 32]`` and an index codec combination of a little-endian codec followed by
194+
a crc32c checksum codec. The size of the corresponding index is
195+
``16 (2x uint64) * 4 (chunks per shard) + 4 (crc32c checksum) = 68 bytes``.
196+
The index would look like::
160197

161198
| chunk (0, 0) | chunk (0, 1) | chunk (1, 0) | chunk (1, 1) | |
162199
| offset | nbytes | offset | nbytes | offset | nbytes | offset | nbytes | checksum |
163200
| uint64 | uint64 | uint64 | uint64 | uint64 | uint64 | uint64 | uint64 | uint32 |
164201

165-
Empty chunks are denoted by setting both offset and nbytes to ``2^64 - 1``.
166-
Empty chunks are interpreted as being filled with the fill value. The index
167-
always has the full shape of all possible chunks per shard, even if they extend
168-
beyond the array shape.
169202

170203
The actual order of the chunk content is not fixed and may be chosen by the
171204
implementation. All possible write orders are valid according to this
172205
specification and therefore can be read by any other implementation. When
173-
writing partial chunks into an existing shard, no specific order of the existing
174-
chunks may be expected. Some writing strategies might be
206+
writing partial inner chunks into an existing shard, no specific order of the existing
207+
inner chunks may be expected. Some writing strategies might be
175208

176209
* **Fixed order**: Specify a fixed order (e.g. row-, column-major, or Morton
177-
order). When replacing existing chunks larger or equal-sized chunks may be
210+
order). When replacing existing inner chunks larger or equal-sized inner chunks may be
178211
replaced in-place, leaving unused space up to an upper limit that might
179212
possibly be specified. Please note that, for regular-sized uncompressed data,
180-
all chunks have the same size and can therefore be replaced in-place.
213+
all inner chunks have the same size and can therefore be replaced in-place.
181214
* **Append-only**: Any chunk to write is appended to the existing shard,
182-
followed by an updated index. If previous chunks are updated, their storage
215+
followed by an updated index. If previous inner chunks are updated, their storage
183216
space becomes unused, as well as the previous index. This might be useful for
184217
storage that only allows append-only updates.
185218
* **Other formats**: Other formats that accept additional bytes at the end of
186-
the file (such as HDF) could be used for storing shards, by writing the chunks
219+
the file (such as HDF) could be used for storing shards, by writing the inner chunks
187220
in the order the format prescribes and appending a binary index derived from
188221
the byte offsets and lengths at the end of the file.
189222

@@ -198,7 +231,7 @@ Implementation notes
198231
The section suggests a non-normative implementation of the codec including
199232
common optimizations.
200233

201-
* **Decoding**: A simple implementation to decode chunks in a shard would (a)
234+
* **Decoding**: A simple implementation to decode inner chunks in a shard would (a)
202235
read the entire value from the store into a byte buffer, (b) parse the shard
203236
index as specified above from the end of the buffer and (c) cut out the
204237
relevant bytes that belong to the requested chunk. The relevant bytes are
@@ -207,27 +240,32 @@ common optimizations.
207240
configuration applying the :ref:`decoding_procedure`. This is similar to how
208241
an implementation would access a sub-slice of a chunk.
209242

210-
When reading all chunks of a shard at once, a useful optimization would be to
243+
The size of the index can be determined by applying ``c.compute_encoded_size``
244+
for each index codec recursively. The initial size is the byte size of the index
245+
array, i.e. ``16 * chunks per shard``.
246+
247+
When reading all inner chunks of a shard at once, a useful optimization would be to
211248
read the entire shard once into a byte buffer and then cut out and decode all
212-
chunks from that buffer in one pass.
249+
inner chunks from that buffer in one pass.
213250

214251
If the underlying store supports partial reads, the decoding of single inner
215252
chunks can be optimized. In that case, the shard index can be read from the
216-
store by requesting the ``n`` last bytes, where ``n`` is 16 bytes multiplied
217-
by the number of chunks in a shard. After parsing the shard index, single
218-
chunks can be requested from the store by specifying the byte range. The
219-
bytestream, then, needs to be decoded as above.
253+
store by requesting the ``n`` last bytes, where ``n`` is the size of the index
254+
as determined by the number of inner chunks in the shard and choice of index
255+
codecs. After parsing the shard index, single inner chunks can be requested from
256+
the store by specifying the byte range. The bytestream, then, needs to be
257+
decoded as above.
220258

221259
* **Encoding**: A simple implementation to encode a chunk in a shard would (a)
222260
encode the new chunk per :ref:`encoding_procedure` in a byte buffer using the
223261
shard's inner codecs, (b) read an existing shard from the store, (c) create a
224-
new bytestream with all encoded chunks of that shard including the overwritten
262+
new bytestream with all encoded inner chunks of that shard including the overwritten
225263
chunk, (d) generate a new shard index that is appended to the chunk bytestream
226264
and (e) writes the shard to the store. If there was no existing shard, an
227-
empty shard is assumed. When writing entire chunks, reading the existing shard
265+
empty shard is assumed. When writing entire inner chunks, reading the existing shard
228266
first may be skipped.
229267

230-
When working with chunks that have a fixed byte size (e.g., uncompressed) and
268+
When working with inner chunks that have a fixed byte size (e.g., uncompressed) and
231269
a store that supports partial writes, a optimization would be to replace the
232270
new chunk by writing to the store at the specified byte range.
233271

docs/v3/core/v3.0.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1186,6 +1186,20 @@ of an additional operation:
11861186
encoded representation is a byte string, then ``decoded_regions``
11871187
specifies a list of byte ranges.
11881188

1189+
- ``c.compute_encoded_size(input_size)``, a procedure that determines the
1190+
size of the encoded representation given a size of the decoded representation.
1191+
This procedure cannot be implemented for codecs that produce variable-sized
1192+
encoded representations, such as compression algorithms. Depending on the
1193+
type of the codec, the signature could differ:
1194+
1195+
- ``c.compute_encoded_size(array_size, data_type) -> (array_size, data_type)``
1196+
for ``array -> array`` codecs, where ``array_size`` is the number of items
1197+
in the array, i.e., the product of the components of the array's shape;
1198+
- ``c.compute_encoded_size(array_size, data_type) -> byte_size``
1199+
for ``array -> bytes`` codecs;
1200+
- ``c.compute_encoded_size(byte_size) -> byte_size``
1201+
for ``bytes -> bytes`` codecs.
1202+
11891203
.. note::
11901204

11911205
If ``partial_decode`` is not supported by a particular codec, it can

0 commit comments

Comments
 (0)