Skip to content

Commit 976c291

Browse files
authored
Merge pull request #42 from alimanfoo/filters
Implementation of filters.
2 parents 4529842 + e4c2213 commit 976c291

36 files changed

+5824
-2454
lines changed

.coveragerc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[run]
2+
omit = zarr/meta_v1.py
3+

c-blosc

Submodule c-blosc updated 60 files

docs/api.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,5 +8,5 @@ API reference
88
api/core
99
api/hierarchy
1010
api/storage
11-
api/compressors
11+
api/codecs
1212
api/sync

docs/api/codecs.rst

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
Compressors and filters (``zarr.codecs``)
2+
=========================================
3+
.. module:: zarr.codecs
4+
5+
This module contains compressor and filter classes for use with Zarr.
6+
7+
Other codecs can be registered dynamically with Zarr. All that is required
8+
is to implement a class that provides the same interface as the classes listed
9+
below, and then to add the class to the ``codec_registry``. See the source
10+
code of this module for details.
11+
12+
.. autoclass:: Codec
13+
14+
.. automethod:: encode
15+
.. automethod:: decode
16+
.. automethod:: get_config
17+
.. automethod:: from_config
18+
19+
.. autoclass:: Blosc
20+
.. autoclass:: Zlib
21+
.. autoclass:: BZ2
22+
.. autoclass:: LZMA
23+
.. autoclass:: Delta
24+
.. autoclass:: FixedScaleOffset
25+
.. autoclass:: Quantize
26+
.. autoclass:: PackBits
27+
.. autoclass:: Categorize

docs/api/compressors.rst

Lines changed: 0 additions & 23 deletions
This file was deleted.

docs/api/core.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,4 @@ The Array class (``zarr.core``)
88
.. automethod:: __setitem__
99
.. automethod:: resize
1010
.. automethod:: append
11+
.. automethod:: view

docs/api/storage.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,5 @@ can be used as a Zarr array store.
1212
.. autoclass:: DictStore
1313
.. autoclass:: DirectoryStore
1414
.. autoclass:: ZipStore
15+
16+
.. autofunction:: migrate_1to2

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Highlights
1717
* Read an array concurrently from multiple threads or processes.
1818
* Write to an array concurrently from multiple threads or processes.
1919
* Organize arrays into hierarchies via groups.
20+
* Use filters to preprocess data and improve compression.
2021

2122
Status
2223
------

docs/release.rst

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,28 @@ Support has been added for organizing arrays into hierarchies via groups. See
1313
the tutorial section on :ref:`tutorial_groups` and the :mod:`zarr.hierarchy`
1414
API docs for more information.
1515

16-
To accommodate support for hierarchies the Zarr format has been modified. See
17-
the :ref:`spec_v2` for more information.
16+
Filters
17+
~~~~~~~
18+
19+
Support has been added for configuring filters to preprocess chunk data prior
20+
to compression. See the tutorial section on :ref:`tutorial_filters` and the
21+
:mod:`zarr.filters` API docs for more information.
1822

1923
Other changes
2024
~~~~~~~~~~~~~
2125

22-
* The bundled Blosc library has been upgraded to version 1.10.2.
26+
To accommodate support for hierarchies and filters, the Zarr metadata format
27+
has been modified. See the :ref:`spec_v2` for more information. To migrate an
28+
array stored using Zarr version 1.x, use the :func:`zarr.storage.migrate_1to2`
29+
function.
30+
31+
The bundled Blosc library has been upgraded to version 1.10.2.
32+
33+
Acknowledgments
34+
~~~~~~~~~~~~~~~
35+
36+
Thanks to Matthew Rocklin (mrocklin_), Stephan Hoyer (shoyer_) and
37+
Francesc Alted (FrancescAlted_) for contributions and comments.
2338

2439
.. _release_1.1.0:
2540

docs/spec/v2.rst

Lines changed: 41 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Zarr storage specification version 2
44
====================================
55

66
This document provides a technical specification of the protocol and format
7-
used for storing a Zarr array. The key words "MUST", "MUST NOT", "REQUIRED",
7+
used for storing Zarr arrays. The key words "MUST", "MUST NOT", "REQUIRED",
88
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
99
"OPTIONAL" in this document are to be interpreted as described in `RFC 2119
1010
<https://www.ietf.org/rfc/rfc2119.txt>`_.
@@ -56,42 +56,47 @@ chunks
5656
dtype
5757
A string or list defining a valid data type for the array. See also
5858
the subsection below on data type encoding.
59-
compression
60-
A string identifying the primary compression library used to compress
61-
each chunk of the array.
62-
compression_opts
63-
An integer, string or dictionary providing options to the primary
64-
compression library.
59+
compressor
60+
A JSON object identifying the primary compression codec and providing
61+
configuration parameters, or ``null`` if no compressor is to be used.
62+
The object MUST contain an ``"id"`` key identifying the codec to be used.
6563
fill_value
6664
A scalar value providing the default value to use for uninitialized
67-
portions of the array.
65+
portions of the array, or ``null`` if no fill_value is to be used.
6866
order
6967
Either "C" or "F", defining the layout of bytes within each chunk of the
7068
array. "C" means row-major order, i.e., the last dimension varies fastest;
7169
"F" means column-major order, i.e., the first dimension varies fastest.
70+
filters
71+
A list of JSON objects providing codec configurations, or ``null`` if no
72+
filters are to be applied. Each codec configuration object MUST contain a
73+
``"id"`` key identifying the codec to be used.
7274

7375
Other keys MUST NOT be present within the metadata object.
7476

7577
For example, the JSON object below defines a 2-dimensional array of 64-bit
7678
little-endian floating point numbers with 10000 rows and 10000 columns, divided
7779
into chunks of 1000 rows and 1000 columns (so there will be 100 chunks in total
7880
arranged in a 10 by 10 grid). Within each chunk the data are laid out in C
79-
contiguous order, and each chunk is compressed using the Blosc compression
80-
library::
81+
contiguous order. Each chunk is encoded using a delta filter and compressed
82+
using the Blosc compression library prior to storage::
8183

8284
{
8385
"chunks": [
8486
1000,
8587
1000
8688
],
87-
"compression": "blosc",
88-
"compression_opts": {
89-
"clevel": 5,
89+
"compressor": {
90+
"id": "blosc",
9091
"cname": "lz4",
92+
"clevel": 5,
9193
"shuffle": 1
9294
},
9395
"dtype": "<f8",
94-
"fill_value": null,
96+
"fill_value": "NaN",
97+
"filters": [
98+
{"id": "delta", "dtype": "<f8", "astype": "<f4"}
99+
],
95100
"order": "C",
96101
"shape": [
97102
10000,
@@ -142,7 +147,6 @@ Positive Infinity ``"Infinity"``
142147
Negative Infinity ``"-Infinity"``
143148
================= ===============
144149

145-
146150
Chunks
147151
~~~~~~
148152

@@ -176,6 +180,16 @@ array dimension is not exactly divisible by the length of the corresponding
176180
chunk dimension then some chunks will overhang the edge of the array. The
177181
contents of any chunk region falling outside the array are undefined.
178182

183+
Filters
184+
~~~~~~~
185+
186+
Optionally a sequence of one or more filters can be used to transform chunk
187+
data prior to compression. When storing data, filters are applied in the order
188+
specified in array metadata to encode data, then the encoded data are passed to
189+
the primary compressor. When retrieving data, stored chunk data are
190+
decompressed by the primary compressor then decoded using filters in the
191+
reverse order.
192+
179193
Hierarchies
180194
-----------
181195

@@ -279,7 +293,7 @@ Create an array::
279293
>>> import zarr
280294
>>> store = zarr.DirectoryStore('example')
281295
>>> a = zarr.create(shape=(20, 20), chunks=(10, 10), dtype='i4',
282-
... fill_value=42, compression='zlib', compression_opts=1,
296+
... fill_value=42, compressor=zarr.Zlib(level=1),
283297
... store=store, overwrite=True)
284298

285299
No chunks are initialized yet, so only the ".zarray" and ".zattrs" keys
@@ -297,10 +311,13 @@ Inspect the array metadata::
297311
10,
298312
10
299313
],
300-
"compression": "zlib",
301-
"compression_opts": 1,
314+
"compressor": {
315+
"id": "zlib",
316+
"level": 1
317+
},
302318
"dtype": "<i4",
303319
"fill_value": 42,
320+
"filters": null,
304321
"order": "C",
305322
"shape": [
306323
20,
@@ -452,6 +469,10 @@ Changes in version 2
452469
* Added support for storing multiple arrays in the same store and organising
453470
arrays into hierarchies using groups.
454471
* Array metadata is now stored under the ".zarray" key instead of the "meta"
455-
key
472+
key.
456473
* Custom attributes are now stored under the ".zattrs" key instead of the
457-
"attrs" key
474+
"attrs" key.
475+
* Added support for filters.
476+
* Changed encoding of "fill_value" field within array metadata.
477+
* Changed encoding of compressor information within array metadata to be
478+
consistent with representation of filter information.

0 commit comments

Comments
 (0)