|
| 1 | +zarr |
| 2 | +==== |
| 3 | + |
| 4 | +A minimal implementation of chunked, compressed, N-dimensional arrays for |
| 5 | +Python. |
| 6 | + |
| 7 | +Installation |
| 8 | +------------ |
| 9 | + |
| 10 | +Install from GitHub (requires NumPy and Cython pre-installed):: |
| 11 | + |
| 12 | + $ pip install -U git+https://github.com/alimanfoo/zarr.git@master |
| 13 | + |
| 14 | +Status |
| 15 | +------ |
| 16 | + |
| 17 | +Highly experimental, pre-alpha. Bug reports and pull requests very welcome. |
| 18 | + |
| 19 | +Design goals |
| 20 | +------------ |
| 21 | + |
| 22 | +* Chunking in multiple dimensions |
| 23 | +* Resize any dimension |
| 24 | +* Concurrent reads |
| 25 | +* Concurrent writes |
| 26 | +* Release the GIL during compression and decompression |
| 27 | + |
| 28 | +Usage |
| 29 | +----- |
| 30 | + |
| 31 | +Create an array:: |
| 32 | + |
| 33 | + >>> import numpy as np |
| 34 | + >>> import zarr |
| 35 | + >>> z = zarr.empty((10000, 1000), dtype='i4', chunks=(1000, 100)) |
| 36 | + >>> z |
| 37 | + zarr.ext.Array((10000, 1000), int32, chunks=(1000, 100), nbytes=38.1M, cbytes=0, cname=blosclz, clevel=5, shuffle=1) |
| 38 | + |
| 39 | +Fill it with some data:: |
| 40 | + |
| 41 | + >>> z[:] = np.arange(10000000, dtype='i4').reshape(10000, 1000) |
| 42 | + >>> z |
| 43 | + zarr.ext.Array((10000, 1000), int32, chunks=(1000, 100), nbytes=38.1M, cbytes=2.0M, cratio=19.3, cname=blosclz, clevel=5, shuffle=1) |
| 44 | + |
| 45 | +Obtain a NumPy array:: |
| 46 | + |
| 47 | + >>> z[:] |
| 48 | + array([[ 0, 1, 2, ..., 997, 998, 999], |
| 49 | + [ 1000, 1001, 1002, ..., 1997, 1998, 1999], |
| 50 | + [ 2000, 2001, 2002, ..., 2997, 2998, 2999], |
| 51 | + ..., |
| 52 | + [9997000, 9997001, 9997002, ..., 9997997, 9997998, 9997999], |
| 53 | + [9998000, 9998001, 9998002, ..., 9998997, 9998998, 9998999], |
| 54 | + [9999000, 9999001, 9999002, ..., 9999997, 9999998, 9999999]], dtype=int32) |
| 55 | + |
| 56 | +Resize the array and add more data:: |
| 57 | + |
| 58 | + >>> z.resize(20000, 1000) |
| 59 | + >>> z |
| 60 | + zarr.ext.Array((20000, 1000), int32, chunks=(1000, 100), nbytes=76.3M, cbytes=2.0M, cratio=38.5, cname=blosclz, clevel=5, shuffle=1) |
| 61 | + >>> z[10000:, :] = np.arange(10000000, dtype='i4').reshape(10000, 1000) |
| 62 | + >>> z |
| 63 | + zarr.ext.Array((20000, 1000), int32, chunks=(1000, 100), nbytes=76.3M, cbytes=4.0M, cratio=19.3, cname=blosclz, clevel=5, shuffle=1) |
| 64 | + |
| 65 | +For convenience, an `append` method is also available, which can be used to |
| 66 | +append data to any axis: |
| 67 | + |
| 68 | + >>> a = np.arange(10000000, dtype='i4').reshape(10000, 1000) |
| 69 | + >>> z = zarr.array(a, chunks=(1000, 100)) |
| 70 | + >>> z |
| 71 | + zarr.ext.Array((10000, 1000), int32, chunks=(1000, 100), nbytes=38.1M, cbytes=2.0M, cratio=19.3, cname=blosclz, clevel=5, shuffle=1) |
| 72 | + >>> z.append(a+a) |
| 73 | + >>> z |
| 74 | + zarr.ext.Array((20000, 1000), int32, chunks=(1000, 100), nbytes=76.3M, cbytes=3.6M, cratio=21.2, cname=blosclz, clevel=5, shuffle=1) |
| 75 | + >>> z.append(np.vstack([a, a]), axis=1) |
| 76 | + >>> z |
| 77 | + zarr.ext.Array((20000, 2000), int32, chunks=(1000, 100), nbytes=152.6M, cbytes=7.6M, cratio=20.2, cname=blosclz, clevel=5, shuffle=1) |
| 78 | + |
| 79 | +Tuning |
| 80 | +------ |
| 81 | + |
| 82 | +``zarr`` is designed for use in parallel computations working chunk-wise |
| 83 | +over data. Try it with [dask.array](http://dask.pydata.org/en/latest/array.html). |
| 84 | + |
| 85 | +``zarr`` is optimised for accessing and storing data in contiguous slices, |
| 86 | +of the same size or larger than chunks. It is not and will never be |
| 87 | +optimised for single item access. |
| 88 | + |
| 89 | +Chunks sizes >= 1M are generally good. Optimal chunk shape will depend on |
| 90 | +the correlation structure in your data. |
| 91 | + |
| 92 | +Acknowledgments |
| 93 | +--------------- |
| 94 | + |
| 95 | +``zarr`` uses [c-blosc](https://github.com/Blosc/c-blosc) internally for |
| 96 | +compression and decompression and borrows code heavily from |
| 97 | +[bcolz](http://bcolz.blosc.org/). |
0 commit comments