@@ -69,33 +69,45 @@ clevel:
6969 level is 0.
7070
7171shuffle:
72- An integer value in the set {0, 1, 2, -1} indicating the way
73- bytes or bits are rearranged, which can lead to faster
74- and/or greater compression. A value of 1
75- indicates that byte-wise shuffling is performed prior to
76- compression. A value of 2 indicates the bit-wise shuffling is
77- performed prior to compression. If a value of -1 is given,
78- then default shuffling is used: bit-wise shuffling for buffers
79- with item size of 1 byte, byte-wise shuffling otherwise.
80- Shuffling is turned off completely when the value is 0.
72+ Specifies the type of shuffling to perform, if any, prior to compression.
73+ Must be one of:
74+
75+ - ``"noshuffle" ``, to indicate no shuffling;
76+ - ``"shuffle" ``, to indicate byte-wise shuffling;
77+ - ``"bitshuffle" ``, to indicate bit-wise shuffling.
78+
79+ Zarr implementations MAY provide users an option to choose a shuffle mode
80+ automatically based on the typesize or other information, but MUST record in
81+ the metadata the mode that is chosen.
82+
83+ typesize:
84+ Positive integer specifying the stride in bytes over which shuffling is
85+ performed. Required unless ``shuffle `` is ``"noshuffle" ``, in which case the value
86+ is ignored.
87+
88+ Zarr implementations MAY allow users to leave this unspecified and have the
89+ implementation choose a value automatically based on the array data type and
90+ previous codecs in the chain, but MUST record in the metadata the value that
91+ is chosen.
8192
8293blocksize:
8394 An integer giving the size in bytes of blocks into which a
8495 buffer is divided before compression. A value of 0
8596 indicates that an automatic size will be used.
8697
87- For example, the array metadata document below specifies that the
88- compressor is the Blosc codec configured with a compression level of
89- 1, byte-wise shuffling , the ``lz4 `` compression algorithm and the
90- default block size::
98+ For example, the array metadata document below specifies that the compressor is
99+ the Blosc codec configured with a compression level of 1, byte-wise shuffling
100+ with a stride of 4 , the ``lz4 `` compression algorithm and the default block
101+ size::
91102
92103 {
93104 "codecs": [{
94105 "name": "blosc",
95106 "configuration": {
96107 "cname": "lz4",
97108 "clevel": 1,
98- "shuffle": 1,
109+ "shuffle": "shuffle",
110+ "typesize": 4,
99111 "blocksize": 0
100112 }
101113 }],
@@ -115,6 +127,25 @@ reference implementation is provided by the `c-blosc library
115127<https://github.com/Blosc/c-blosc> `_.
116128
117129
130+ Comparison to Zarr v2
131+ =====================
132+
133+ While the binary format is identical, the JSON metadata differs from that used
134+ by the Zarr v2 ``blosc `` codec in the following ways:
135+
136+ - The `shuffle ` mode is now specified more clearly as `noshuffle ` (0 in Zarr v2),
137+ `"bitshuffle" ` (2 in Zarr v2), or `"shuffle" ` (1 in Zarr v2). Using these constants
138+ rather than numbers makes it much easier to know what shuffle mode will be
139+ used from manual inspection of the metadata.
140+
141+ - When shuffling is enabled, the `typesize ` must now be specified explicitly in
142+ the metadata, rather than determined implicitly from the input data. This
143+ allows Blosc to function as a pure "bytes -> bytes" codec rather than an
144+ "array -> bytes" codec.
145+
146+ - There is no option to choose between bit-wise and byte-wise shuffling
147+ automatically, as supported in Zarr v2 via a `shuffle ` value of `-1 `.
148+
118149References
119150==========
120151
0 commit comments