Skip to content

Commit 9476653

Browse files
authored
Merge pull request #225 from jbms/revise-blosc-shuffle
Revise how the Blosc codec is specified
2 parents 9346bcc + d31dab3 commit 9476653

File tree

1 file changed

+45
-14
lines changed

1 file changed

+45
-14
lines changed

docs/v3/codecs/blosc/v1.0.rst

Lines changed: 45 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -69,33 +69,45 @@ clevel:
6969
level is 0.
7070

7171
shuffle:
72-
An integer value in the set {0, 1, 2, -1} indicating the way
73-
bytes or bits are rearranged, which can lead to faster
74-
and/or greater compression. A value of 1
75-
indicates that byte-wise shuffling is performed prior to
76-
compression. A value of 2 indicates the bit-wise shuffling is
77-
performed prior to compression. If a value of -1 is given,
78-
then default shuffling is used: bit-wise shuffling for buffers
79-
with item size of 1 byte, byte-wise shuffling otherwise.
80-
Shuffling is turned off completely when the value is 0.
72+
Specifies the type of shuffling to perform, if any, prior to compression.
73+
Must be one of:
74+
75+
- ``"noshuffle"``, to indicate no shuffling;
76+
- ``"shuffle"``, to indicate byte-wise shuffling;
77+
- ``"bitshuffle"``, to indicate bit-wise shuffling.
78+
79+
Zarr implementations MAY provide users an option to choose a shuffle mode
80+
automatically based on the typesize or other information, but MUST record in
81+
the metadata the mode that is chosen.
82+
83+
typesize:
84+
Positive integer specifying the stride in bytes over which shuffling is
85+
performed. Required unless ``shuffle`` is ``"noshuffle"``, in which case the value
86+
is ignored.
87+
88+
Zarr implementations MAY allow users to leave this unspecified and have the
89+
implementation choose a value automatically based on the array data type and
90+
previous codecs in the chain, but MUST record in the metadata the value that
91+
is chosen.
8192

8293
blocksize:
8394
An integer giving the size in bytes of blocks into which a
8495
buffer is divided before compression. A value of 0
8596
indicates that an automatic size will be used.
8697

87-
For example, the array metadata document below specifies that the
88-
compressor is the Blosc codec configured with a compression level of
89-
1, byte-wise shuffling, the ``lz4`` compression algorithm and the
90-
default block size::
98+
For example, the array metadata document below specifies that the compressor is
99+
the Blosc codec configured with a compression level of 1, byte-wise shuffling
100+
with a stride of 4, the ``lz4`` compression algorithm and the default block
101+
size::
91102

92103
{
93104
"codecs": [{
94105
"name": "blosc",
95106
"configuration": {
96107
"cname": "lz4",
97108
"clevel": 1,
98-
"shuffle": 1,
109+
"shuffle": "shuffle",
110+
"typesize": 4,
99111
"blocksize": 0
100112
}
101113
}],
@@ -115,6 +127,25 @@ reference implementation is provided by the `c-blosc library
115127
<https://github.com/Blosc/c-blosc>`_.
116128

117129

130+
Comparison to Zarr v2
131+
=====================
132+
133+
While the binary format is identical, the JSON metadata differs from that used
134+
by the Zarr v2 ``blosc`` codec in the following ways:
135+
136+
- The `shuffle` mode is now specified more clearly as `noshuffle` (0 in Zarr v2),
137+
`"bitshuffle"` (2 in Zarr v2), or `"shuffle"` (1 in Zarr v2). Using these constants
138+
rather than numbers makes it much easier to know what shuffle mode will be
139+
used from manual inspection of the metadata.
140+
141+
- When shuffling is enabled, the `typesize` must now be specified explicitly in
142+
the metadata, rather than determined implicitly from the input data. This
143+
allows Blosc to function as a pure "bytes -> bytes" codec rather than an
144+
"array -> bytes" codec.
145+
146+
- There is no option to choose between bit-wise and byte-wise shuffling
147+
automatically, as supported in Zarr v2 via a `shuffle` value of `-1`.
148+
118149
References
119150
==========
120151

0 commit comments

Comments
 (0)