@@ -4,7 +4,7 @@ Zarr storage specification version 2
4
4
====================================
5
5
6
6
This document provides a technical specification of the protocol and format
7
- used for storing a Zarr array . The key words "MUST", "MUST NOT", "REQUIRED",
7
+ used for storing Zarr arrays . The key words "MUST", "MUST NOT", "REQUIRED",
8
8
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
9
9
"OPTIONAL" in this document are to be interpreted as described in `RFC 2119
10
10
<https://www.ietf.org/rfc/rfc2119.txt> `_.
@@ -56,42 +56,47 @@ chunks
56
56
dtype
57
57
A string or list defining a valid data type for the array. See also
58
58
the subsection below on data type encoding.
59
- compression
60
- A string identifying the primary compression library used to compress
61
- each chunk of the array.
62
- compression_opts
63
- An integer, string or dictionary providing options to the primary
64
- compression library.
59
+ compressor
60
+ A JSON object identifying the primary compression codec and providing
61
+ configuration parameters, or ``null `` if no compressor is to be used.
62
+ The object MUST contain an ``"id" `` key identifying the codec to be used.
65
63
fill_value
66
64
A scalar value providing the default value to use for uninitialized
67
- portions of the array.
65
+ portions of the array, or `` null `` if no fill_value is to be used .
68
66
order
69
67
Either "C" or "F", defining the layout of bytes within each chunk of the
70
68
array. "C" means row-major order, i.e., the last dimension varies fastest;
71
69
"F" means column-major order, i.e., the first dimension varies fastest.
70
+ filters
71
+ A list of JSON objects providing codec configurations, or ``null `` if no
72
+ filters are to be applied. Each codec configuration object MUST contain a
73
+ ``"id" `` key identifying the codec to be used.
72
74
73
75
Other keys MUST NOT be present within the metadata object.
74
76
75
77
For example, the JSON object below defines a 2-dimensional array of 64-bit
76
78
little-endian floating point numbers with 10000 rows and 10000 columns, divided
77
79
into chunks of 1000 rows and 1000 columns (so there will be 100 chunks in total
78
80
arranged in a 10 by 10 grid). Within each chunk the data are laid out in C
79
- contiguous order, and each chunk is compressed using the Blosc compression
80
- library::
81
+ contiguous order. Each chunk is encoded using a delta filter and compressed
82
+ using the Blosc compression library prior to storage ::
81
83
82
84
{
83
85
"chunks": [
84
86
1000,
85
87
1000
86
88
],
87
- "compression": "blosc",
88
- "compression_opts": {
89
- "clevel": 5,
89
+ "compressor": {
90
+ "id": "blosc",
90
91
"cname": "lz4",
92
+ "clevel": 5,
91
93
"shuffle": 1
92
94
},
93
95
"dtype": "<f8",
94
- "fill_value": null,
96
+ "fill_value": "NaN",
97
+ "filters": [
98
+ {"id": "delta", "dtype": "<f8", "astype": "<f4"}
99
+ ],
95
100
"order": "C",
96
101
"shape": [
97
102
10000,
@@ -142,7 +147,6 @@ Positive Infinity ``"Infinity"``
142
147
Negative Infinity ``"-Infinity" ``
143
148
================= ===============
144
149
145
-
146
150
Chunks
147
151
~~~~~~
148
152
@@ -176,6 +180,16 @@ array dimension is not exactly divisible by the length of the corresponding
176
180
chunk dimension then some chunks will overhang the edge of the array. The
177
181
contents of any chunk region falling outside the array are undefined.
178
182
183
+ Filters
184
+ ~~~~~~~
185
+
186
+ Optionally a sequence of one or more filters can be used to transform chunk
187
+ data prior to compression. When storing data, filters are applied in the order
188
+ specified in array metadata to encode data, then the encoded data are passed to
189
+ the primary compressor. When retrieving data, stored chunk data are
190
+ decompressed by the primary compressor then decoded using filters in the
191
+ reverse order.
192
+
179
193
Hierarchies
180
194
-----------
181
195
@@ -279,7 +293,7 @@ Create an array::
279
293
>>> import zarr
280
294
>>> store = zarr.DirectoryStore('example')
281
295
>>> a = zarr.create(shape=(20, 20), chunks=(10, 10), dtype='i4',
282
- ... fill_value=42, compression='zlib', compression_opts=1 ,
296
+ ... fill_value=42, compressor=zarr.Zlib(level=1) ,
283
297
... store=store, overwrite=True)
284
298
285
299
No chunks are initialized yet, so only the ".zarray" and ".zattrs" keys
@@ -297,10 +311,13 @@ Inspect the array metadata::
297
311
10,
298
312
10
299
313
],
300
- "compression": "zlib",
301
- "compression_opts": 1,
314
+ "compressor": {
315
+ "id": "zlib",
316
+ "level": 1
317
+ },
302
318
"dtype": "<i4",
303
319
"fill_value": 42,
320
+ "filters": null,
304
321
"order": "C",
305
322
"shape": [
306
323
20,
@@ -452,6 +469,10 @@ Changes in version 2
452
469
* Added support for storing multiple arrays in the same store and organising
453
470
arrays into hierarchies using groups.
454
471
* Array metadata is now stored under the ".zarray" key instead of the "meta"
455
- key
472
+ key.
456
473
* Custom attributes are now stored under the ".zattrs" key instead of the
457
- "attrs" key
474
+ "attrs" key.
475
+ * Added support for filters.
476
+ * Changed encoding of "fill_value" field within array metadata.
477
+ * Changed encoding of compressor information within array metadata to be
478
+ consistent with representation of filter information.
0 commit comments