Skip to content

Commit 66ae36f

Browse files
committed
Revise how the Blosc codec is specified
The `shuffle` mode is now specified more clearly as `null` (0 in Zarr v2), `"bit"` (2 in Zarr v2), or `"byte"` (1 in Zarr v2). Using these constants rather than numbers makes it much easier to know what shuffle mode will be using from manual inspection of the metadata. When shuffling is enabled, the `typesize` must now be specified explicitly in the metadata, rather than determined implicitly from the input data. This allows Blosc to function as a pure "bytes -> bytes" codec rather than an "array -> bytes" codec. The special `shuffle` value of `-1`, which indicated to choose automatically between bit-wise or byte-wise shuffling depending on the typesize value, is not supported in the metadata.
1 parent 23a74f7 commit 66ae36f

File tree

1 file changed

+42
-14
lines changed

1 file changed

+42
-14
lines changed

docs/v3/codecs/blosc/v1.0.rst

Lines changed: 42 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -70,33 +70,36 @@ clevel:
7070
level is 0.
7171

7272
shuffle:
73-
An integer value in the set {0, 1, 2, -1} indicating the way
74-
bytes or bits are rearranged, which can lead to faster
75-
and/or greater compression. A value of 1
76-
indicates that byte-wise shuffling is performed prior to
77-
compression. A value of 2 indicates the bit-wise shuffling is
78-
performed prior to compression. If a value of -1 is given,
79-
then default shuffling is used: bit-wise shuffling for buffers
80-
with item size of 1 byte, byte-wise shuffling otherwise.
81-
Shuffling is turned off completely when the value is 0.
73+
Specifies the type of shuffling to perform, if any, prior to compression.
74+
Must be one of:
75+
76+
- ``"noshuffle"``, to indicate no shuffling;
77+
- ``"shuffle"``, to indicate byte-wise shuffling;
78+
- ``"bitshuffle"``, to indicate bit-wise shuffling.
79+
80+
typesize:
81+
Positive integer specifying the stride in bytes over which shuffling is
82+
performed. Required unless ``shuffle`` is ``"noshuffle"``, in which case the value
83+
is ignored.
8284

8385
blocksize:
8486
An integer giving the size in bytes of blocks into which a
8587
buffer is divided before compression. A value of 0
8688
indicates that an automatic size will be used.
8789

88-
For example, the array metadata document below specifies that the
89-
compressor is the Blosc codec configured with a compression level of
90-
1, byte-wise shuffling, the ``lz4`` compression algorithm and the
91-
default block size::
90+
For example, the array metadata document below specifies that the compressor is
91+
the Blosc codec configured with a compression level of 1, byte-wise shuffling
92+
with a stride of 4, the ``lz4`` compression algorithm and the default block
93+
size::
9294

9395
{
9496
"codecs": [{
9597
"name": "blosc",
9698
"configuration": {
9799
"cname": "lz4",
98100
"clevel": 1,
99-
"shuffle": 1,
101+
"shuffle": "byte",
102+
"typesize": 4,
100103
"blocksize": 0
101104
}
102105
}],
@@ -114,6 +117,31 @@ reference implementation is provided by the `c-blosc library
114117
<https://github.com/Blosc/c-blosc>`_.
115118

116119

120+
Comparison to Zarr v2
121+
=====================
122+
123+
While the binary format is identical, the JSON metadata differs from that used
124+
by the Zarr v2 ``blosc`` codec in the following ways:
125+
126+
- The `shuffle` mode is now specified more clearly as `null` (0 in Zarr v2),
127+
`"bit"` (2 in Zarr v2), or `"byte"` (1 in Zarr v2). Using these constants
128+
rather than numbers makes it much easier to know what shuffle mode will be
129+
using from manual inspection of the metadata.
130+
131+
- When shuffling is enabled, the `typesize` must now be specified explicitly in
132+
the metadata, rather than determined implicitly from the input data. This
133+
allows Blosc to function as a pure "bytes -> bytes" codec rather than an
134+
"array -> bytes" codec. Zarr implementations MAY allow users to leave this
135+
unspecified and have the implementation choose a value automatically based on
136+
the array data type and previous codecs in the chain, but MUST record in the
137+
metadata the value that is chosen.
138+
139+
- There is no option to choose between bit-wise and byte-wise shuffling
140+
automatically, as supported in Zarr v2 via a `shuffle` value of `-1`. Zarr
141+
implementations MAY provide users an option to choose a shuffle mode
142+
automatically based on the typesize or other information, but MUST record in
143+
the metadata the mode that is chosen.
144+
117145
References
118146
==========
119147

0 commit comments

Comments
 (0)