Skip to content

Commit 89747d2

Browse files
committed
rename codecs; increase codec test coverage
1 parent 6958709 commit 89747d2

16 files changed

+701
-559
lines changed

docs/api/codecs.rst

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,19 @@ is to implement a class that provides the same interface as the classes listed
99
below, and then to add the class to the ``codec_registry``. See the source
1010
code of this module for details.
1111

12-
.. autoclass:: BloscCompressor
13-
.. autoclass:: ZlibCompressor
14-
.. autoclass:: BZ2Compressor
15-
.. autoclass:: LZMACompressor
16-
.. autoclass:: DeltaFilter
17-
.. autoclass:: FixedScaleOffsetFilter
18-
.. autoclass:: QuantizeFilter
19-
.. autoclass:: PackBitsFilter
20-
.. autoclass:: CategorizeFilter
12+
.. autoclass:: Codec
13+
14+
.. automethod:: encode
15+
.. automethod:: decode
16+
.. automethod:: get_config
17+
.. automethod:: from_config
18+
19+
.. autoclass:: Blosc
20+
.. autoclass:: Zlib
21+
.. autoclass:: BZ2
22+
.. autoclass:: LZMA
23+
.. autoclass:: Delta
24+
.. autoclass:: FixedScaleOffset
25+
.. autoclass:: Quantize
26+
.. autoclass:: PackBits
27+
.. autoclass:: Categorize

docs/spec/v2.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -295,7 +295,7 @@ Create an array::
295295
>>> import zarr
296296
>>> store = zarr.DirectoryStore('example')
297297
>>> a = zarr.create(shape=(20, 20), chunks=(10, 10), dtype='i4',
298-
... fill_value=42, compressor=zarr.ZlibCompressor(level=1),
298+
... fill_value=42, compressor=zarr.Zlib(level=1),
299299
... store=store, overwrite=True)
300300

301301
No chunks are initialized yet, so only the ".zarray" and ".zattrs" keys

docs/tutorial.rst

Lines changed: 49 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ example::
2222
>>> z
2323
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
2424
nbytes: 381.5M; nbytes_stored: 323; ratio: 1238390.1; initialized: 0/100
25-
compressor: BloscCompressor(cname='lz4', clevel=5, shuffle=1)
25+
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
2626
store: dict
2727

2828
The code above creates a 2-dimensional array of 32-bit integers with
@@ -45,7 +45,7 @@ scalar value::
4545
>>> z
4646
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
4747
nbytes: 381.5M; nbytes_stored: 2.2M; ratio: 170.4; initialized: 100/100
48-
compressor: BloscCompressor(cname='lz4', clevel=5, shuffle=1)
48+
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
4949
store: dict
5050

5151
Notice that the values of ``nbytes_stored``, ``ratio`` and
@@ -93,7 +93,7 @@ enabling persistence of data between sessions. For example::
9393
>>> z1
9494
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
9595
nbytes: 381.5M; nbytes_stored: 323; ratio: 1238390.1; initialized: 0/100
96-
compressor: BloscCompressor(cname='lz4', clevel=5, shuffle=1)
96+
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
9797
store: DirectoryStore
9898

9999
The array above will store its configuration metadata and all
@@ -117,7 +117,7 @@ Check that the data have been written and can be read again::
117117
>>> z2
118118
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
119119
nbytes: 381.5M; nbytes_stored: 2.3M; ratio: 163.9; initialized: 100/100
120-
compressor: BloscCompressor(cname='lz4', clevel=5, shuffle=1)
120+
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
121121
store: DirectoryStore
122122
>>> np.all(z1[:] == z2[:])
123123
True
@@ -136,7 +136,7 @@ can be increased or decreased in length. For example::
136136
>>> z
137137
Array((20000, 10000), float64, chunks=(1000, 1000), order=C)
138138
nbytes: 1.5G; nbytes_stored: 5.7M; ratio: 268.5; initialized: 100/200
139-
compressor: BloscCompressor(cname='lz4', clevel=5, shuffle=1)
139+
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
140140
store: dict
141141

142142
Note that when an array is resized, the underlying data are not
@@ -152,19 +152,19 @@ which can be used to append data to any axis. E.g.::
152152
>>> z
153153
Array((10000, 1000), int32, chunks=(1000, 100), order=C)
154154
nbytes: 38.1M; nbytes_stored: 1.9M; ratio: 20.0; initialized: 100/100
155-
compressor: BloscCompressor(cname='lz4', clevel=5, shuffle=1)
155+
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
156156
store: dict
157157
>>> z.append(a)
158158
>>> z
159159
Array((20000, 1000), int32, chunks=(1000, 100), order=C)
160160
nbytes: 76.3M; nbytes_stored: 3.8M; ratio: 20.0; initialized: 200/200
161-
compressor: BloscCompressor(cname='lz4', clevel=5, shuffle=1)
161+
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
162162
store: dict
163163
>>> z.append(np.vstack([a, a]), axis=1)
164164
>>> z
165165
Array((20000, 2000), int32, chunks=(1000, 100), order=C)
166166
nbytes: 152.6M; nbytes_stored: 7.6M; ratio: 20.0; initialized: 400/400
167-
compressor: BloscCompressor(cname='lz4', clevel=5, shuffle=1)
167+
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
168168
store: dict
169169

170170
.. _tutorial_compress:
@@ -187,11 +187,11 @@ accepted by all array creation functions. For example::
187187

188188
>>> z = zarr.array(np.arange(100000000, dtype='i4').reshape(10000, 10000),
189189
... chunks=(1000, 1000),
190-
... compressor=zarr.BloscCompressor(cname='zstd', clevel=3, shuffle=2))
190+
... compressor=zarr.Blosc(cname='zstd', clevel=3, shuffle=2))
191191
>>> z
192192
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
193193
nbytes: 381.5M; nbytes_stored: 3.1M; ratio: 121.1; initialized: 100/100
194-
compressor: BloscCompressor(cname='zstd', clevel=3, shuffle=2)
194+
compressor: Blosc(cname='zstd', clevel=3, shuffle=2)
195195
store: dict
196196

197197
The array above will use Blosc as the primary compressor, using the
@@ -212,11 +212,11 @@ compression, level 1::
212212

213213
>>> z = zarr.array(np.arange(100000000, dtype='i4').reshape(10000, 10000),
214214
... chunks=(1000, 1000),
215-
... compressor=zarr.ZlibCompressor(level=1))
215+
... compressor=zarr.Zlib(level=1))
216216
>>> z
217217
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
218218
nbytes: 381.5M; nbytes_stored: 132.2M; ratio: 2.9; initialized: 100/100
219-
compressor: ZlibCompressor(level=1)
219+
compressor: Zlib(level=1)
220220
store: dict
221221

222222
Here is an example using LZMA with a custom filter pipeline including
@@ -225,13 +225,13 @@ the delta filter::
225225
>>> import lzma
226226
>>> filters = [dict(id=lzma.FILTER_DELTA, dist=4),
227227
... dict(id=lzma.FILTER_LZMA2, preset=1)]
228-
>>> compressor = zarr.LZMACompressor(filters=filters)
228+
>>> compressor = zarr.LZMA(filters=filters)
229229
>>> z = zarr.array(np.arange(100000000, dtype='i4').reshape(10000, 10000),
230230
... chunks=(1000, 1000), compressor=compressor)
231231
>>> z
232232
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
233233
nbytes: 381.5M; nbytes_stored: 248.9K; ratio: 1569.6; initialized: 100/100
234-
compressor: LZMACompressor(format=1, check=-1, preset=None, filters=[{'dist': 4, 'id': 3}, {'preset': 1, 'id': 33}])
234+
compressor: LZMA(format=1, check=-1, preset=None, filters=[{'dist': 4, 'id': 3}, {'preset': 1, 'id': 33}])
235235
store: dict
236236

237237
To disable compression, set ``compressor=None`` when creating an array.
@@ -255,15 +255,15 @@ the primary compressor.
255255

256256
Here is an example using the Zarr delta filter with the Blosc compressor:
257257

258-
>>> filters = [zarr.DeltaFilter(dtype='i4')]
259-
>>> compressor = zarr.BloscCompressor(cname='zstd', clevel=1, shuffle=1)
258+
>>> filters = [zarr.Delta(dtype='i4')]
259+
>>> compressor = zarr.Blosc(cname='zstd', clevel=1, shuffle=1)
260260
>>> z = zarr.array(np.arange(100000000, dtype='i4').reshape(10000, 10000),
261261
... chunks=(1000, 1000), filters=filters, compressor=compressor)
262262
>>> z
263263
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
264264
nbytes: 381.5M; nbytes_stored: 381.9K; ratio: 1022.9; initialized: 100/100
265-
compressor: BloscCompressor(cname='zstd', clevel=1, shuffle=1)
266-
filters: DeltaFilter
265+
filters: Delta(dtype=int32)
266+
compressor: Blosc(cname='zstd', clevel=1, shuffle=1)
267267
store: dict
268268

269269
Zarr comes with implementations of delta, scale-offset, quantize, packbits and
@@ -311,7 +311,7 @@ array with thread synchronization::
311311
>>> z
312312
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
313313
nbytes: 381.5M; nbytes_stored: 323; ratio: 1238390.1; initialized: 0/100
314-
compressor: BloscCompressor(cname='lz4', clevel=5, shuffle=1)
314+
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
315315
store: dict; synchronizer: ThreadSynchronizer
316316

317317
This array is safe to read or write within a multi-threaded program.
@@ -326,7 +326,7 @@ provided that all processes have access to a shared file system. E.g.::
326326
>>> z
327327
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
328328
nbytes: 381.5M; nbytes_stored: 323; ratio: 1238390.1; initialized: 0/100
329-
compressor: BloscCompressor(cname='lz4', clevel=5, shuffle=1)
329+
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
330330
store: DirectoryStore; synchronizer: ProcessSynchronizer
331331

332332
This array is safe to read or write from multiple processes.
@@ -376,17 +376,28 @@ For example, groups can contain other groups::
376376
>>> foo_group = root_group.create_group('foo')
377377
>>> bar_group = foo_group.create_group('bar')
378378

379-
Groups can also contain arrays, also known as "datasets" in HDF5 terminology.
380-
For compatibility with h5py, Zarr groups implement the
381-
:func:`zarr.hierarchy.Group.create_dataset` method, e.g.::
379+
Groups can also contain arrays, e.g.::
382380

383-
>>> z = bar_group.create_dataset('baz', shape=(10000, 10000),
381+
>>> z1 = bar_group.zeros('baz', shape=(10000, 10000), chunks=(1000, 1000), dtype='i4',
382+
... compressor=zarr.Blosc(cname='zstd', clevel=1, shuffle=1))
383+
>>> z1
384+
Array(/foo/bar/baz, (10000, 10000), int32, chunks=(1000, 1000), order=C)
385+
nbytes: 381.5M; nbytes_stored: 324; ratio: 1234567.9; initialized: 0/100
386+
compressor: Blosc(cname='zstd', clevel=1, shuffle=1)
387+
store: DictStore
388+
389+
Arrays are known as "datasets" in HDF5 terminology. For compatibility with
390+
h5py, Zarr groups also implement the :func:`zarr.hierarchy.Group.create_dataset`
391+
method, e.g.::
392+
393+
>>> z = bar_group.create_dataset('quux', shape=(10000, 10000),
384394
... chunks=(1000, 1000), dtype='i4',
385-
... fill_value=0)
395+
... fill_value=0, compression='gzip',
396+
... compression_opts=1)
386397
>>> z
387-
Array(/foo/bar/baz, (10000, 10000), int32, chunks=(1000, 1000), order=C)
388-
nbytes: 381.5M; nbytes_stored: 323; ratio: 1238390.1; initialized: 0/100
389-
compressor: BloscCompressor(cname='lz4', clevel=5, shuffle=1)
398+
Array(/foo/bar/quux, (10000, 10000), int32, chunks=(1000, 1000), order=C)
399+
nbytes: 381.5M; nbytes_stored: 275; ratio: 1454545.5; initialized: 0/100
400+
compressor: Zlib(level=1)
390401
store: DictStore
391402

392403
Members of a group can be accessed via the suffix notation, e.g.::
@@ -400,13 +411,13 @@ The '/' character can be used to access multiple levels of the hierarchy,
400411
e.g.::
401412

402413
>>> root_group['foo/bar']
403-
Group(/foo/bar, 1)
404-
arrays: 1; baz
414+
Group(/foo/bar, 2)
415+
arrays: 2; baz, quux
405416
store: DictStore
406417
>>> root_group['foo/bar/baz']
407418
Array(/foo/bar/baz, (10000, 10000), int32, chunks=(1000, 1000), order=C)
408-
nbytes: 381.5M; nbytes_stored: 323; ratio: 1238390.1; initialized: 0/100
409-
compressor: BloscCompressor(cname='lz4', clevel=5, shuffle=1)
419+
nbytes: 381.5M; nbytes_stored: 324; ratio: 1234567.9; initialized: 0/100
420+
compressor: Blosc(cname='zstd', clevel=1, shuffle=1)
410421
store: DictStore
411422

412423
The :func:`zarr.hierarchy.open_group` provides a convenient way to create or
@@ -423,7 +434,7 @@ stored in sub-directories, e.g.::
423434
>>> z
424435
Array(/foo/bar/baz, (10000, 10000), int32, chunks=(1000, 1000), order=C)
425436
nbytes: 381.5M; nbytes_stored: 323; ratio: 1238390.1; initialized: 0/100
426-
compressor: BloscCompressor(cname='lz4', clevel=5, shuffle=1)
437+
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
427438
store: DirectoryStore
428439

429440
For more information on groups see the :mod:`zarr.hierarchy` API docs.
@@ -465,12 +476,12 @@ data. E.g.::
465476
>>> zarr.array(a, chunks=(1000, 1000))
466477
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
467478
nbytes: 381.5M; nbytes_stored: 26.3M; ratio: 14.5; initialized: 100/100
468-
compressor: BloscCompressor(cname='lz4', clevel=5, shuffle=1)
479+
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
469480
store: dict
470481
>>> zarr.array(a, chunks=(1000, 1000), order='F')
471482
Array((10000, 10000), int32, chunks=(1000, 1000), order=F)
472483
nbytes: 381.5M; nbytes_stored: 9.5M; ratio: 40.1; initialized: 100/100
473-
compressor: BloscCompressor(cname='lz4', clevel=5, shuffle=1)
484+
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
474485
store: dict
475486

476487
In the above example, Fortran order gives a better compression ratio. This
@@ -494,13 +505,13 @@ Here is an example storing an array directly into a Zip file::
494505
>>> z
495506
Array((1000, 1000), int32, chunks=(100, 100), order=C)
496507
nbytes: 3.8M; nbytes_stored: 319; ratio: 12539.2; initialized: 0/100
497-
compressor: BloscCompressor(cname='lz4', clevel=5, shuffle=1)
508+
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
498509
store: ZipStore
499510
>>> z[:] = 42
500511
>>> z
501512
Array((1000, 1000), int32, chunks=(100, 100), order=C)
502513
nbytes: 3.8M; nbytes_stored: 25.7K; ratio: 152.0; initialized: 100/100
503-
compressor: BloscCompressor(cname='lz4', clevel=5, shuffle=1)
514+
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
504515
store: ZipStore
505516
>>> import os
506517
>>> os.path.getsize('example.zip')
@@ -513,7 +524,7 @@ Re-open and check that data have been written::
513524
>>> z
514525
Array((1000, 1000), int32, chunks=(100, 100), order=C)
515526
nbytes: 3.8M; nbytes_stored: 25.7K; ratio: 152.0; initialized: 100/100
516-
compressor: BloscCompressor(cname='lz4', clevel=5, shuffle=1)
527+
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
517528
store: ZipStore
518529
>>> z[:]
519530
array([[42, 42, 42, ..., 42, 42, 42],

0 commit comments

Comments
 (0)