|
1 | 1 | zarr
|
2 | 2 | ====
|
3 | 3 |
|
4 |
| -A minimal implementation of chunked, compressed, N-dimensional arrays for |
| 4 | +A minimal implementation of chunked, compressed, N-dimensional arrays for |
5 | 5 | Python.
|
6 | 6 |
|
7 | 7 | * Source code: https://github.com/alimanfoo/zarr
|
@@ -44,15 +44,17 @@ Create an array::
|
44 | 44 | >>> import zarr
|
45 | 45 | >>> z = zarr.empty((10000, 1000), dtype='i4', chunks=(1000, 100))
|
46 | 46 | >>> z
|
47 |
| - zarr.ext.Array((10000, 1000), int32, chunks=(1000, 100), cname='blosclz', clevel=5, shuffle=1) |
48 |
| - nbytes: 38.1M; cbytes: 0 |
| 47 | + zarr.ext.SynchronizedArray((10000, 1000), int32, chunks=(1000, 100)) |
| 48 | + cname: 'blosclz'; clevel: 5; shuffle: 1 (BYTESHUFFLE) |
| 49 | + nbytes: 38.1M; cbytes: 0; initialized: 0/100 |
49 | 50 |
|
50 | 51 | Fill it with some data::
|
51 | 52 |
|
52 | 53 | >>> z[:] = np.arange(10000000, dtype='i4').reshape(10000, 1000)
|
53 | 54 | >>> z
|
54 |
| - zarr.ext.Array((10000, 1000), int32, chunks=(1000, 100), cname='blosclz', clevel=5, shuffle=1) |
55 |
| - nbytes: 38.1M; cbytes: 2.0M; ratio: 19.3 |
| 55 | + zarr.ext.SynchronizedArray((10000, 1000), int32, chunks=(1000, 100)) |
| 56 | + cname: 'blosclz'; clevel: 5; shuffle: 1 (BYTESHUFFLE) |
| 57 | + nbytes: 38.1M; cbytes: 2.0M; ratio: 19.3; initialized: 100/100 |
56 | 58 |
|
57 | 59 | Obtain a NumPy array by slicing::
|
58 | 60 |
|
@@ -85,47 +87,52 @@ Resize the array and add more data::
|
85 | 87 |
|
86 | 88 | >>> z.resize(20000, 1000)
|
87 | 89 | >>> z
|
88 |
| - zarr.ext.Array((20000, 1000), int32, chunks=(1000, 100), cname='blosclz', clevel=5, shuffle=1) |
89 |
| - nbytes: 76.3M; cbytes: 2.0M; ratio: 38.5 |
| 90 | + zarr.ext.SynchronizedArray((20000, 1000), int32, chunks=(1000, 100)) |
| 91 | + cname: 'blosclz'; clevel: 5; shuffle: 1 (BYTESHUFFLE) |
| 92 | + nbytes: 76.3M; cbytes: 2.0M; ratio: 38.5; initialized: 100/200 |
90 | 93 | >>> z[10000:, :] = np.arange(10000000, dtype='i4').reshape(10000, 1000)
|
91 | 94 | >>> z
|
92 |
| - zarr.ext.Array((20000, 1000), int32, chunks=(1000, 100), cname='blosclz', clevel=5, shuffle=1) |
93 |
| - nbytes: 76.3M; cbytes: 4.0M; ratio: 19.3 |
| 95 | + zarr.ext.SynchronizedArray((20000, 1000), int32, chunks=(1000, 100)) |
| 96 | + cname: 'blosclz'; clevel: 5; shuffle: 1 (BYTESHUFFLE) |
| 97 | + nbytes: 76.3M; cbytes: 4.0M; ratio: 19.3; initialized: 200/200 |
94 | 98 |
|
95 | 99 | For convenience, an ``append()`` method is also available, which can be used to
|
96 | 100 | append data to any axis::
|
97 | 101 |
|
98 | 102 | >>> a = np.arange(10000000, dtype='i4').reshape(10000, 1000)
|
99 | 103 | >>> z = zarr.array(a, chunks=(1000, 100))
|
100 | 104 | >>> z
|
101 |
| - zarr.ext.Array((10000, 1000), int32, chunks=(1000, 100), cname='blosclz', clevel=5, shuffle=1) |
102 |
| - nbytes: 38.1M; cbytes: 2.0M; ratio: 19.3 |
| 105 | + zarr.ext.SynchronizedArray((10000, 1000), int32, chunks=(1000, 100)) |
| 106 | + cname: 'blosclz'; clevel: 5; shuffle: 1 (BYTESHUFFLE) |
| 107 | + nbytes: 38.1M; cbytes: 2.0M; ratio: 19.3; initialized: 100/100 |
103 | 108 | >>> z.append(a+a)
|
104 | 109 | >>> z
|
105 |
| - zarr.ext.Array((20000, 1000), int32, chunks=(1000, 100), cname='blosclz', clevel=5, shuffle=1) |
106 |
| - nbytes: 76.3M; cbytes: 3.6M; ratio: 21.2 |
| 110 | + zarr.ext.SynchronizedArray((20000, 1000), int32, chunks=(1000, 100)) |
| 111 | + cname: 'blosclz'; clevel: 5; shuffle: 1 (BYTESHUFFLE) |
| 112 | + nbytes: 76.3M; cbytes: 3.6M; ratio: 21.2; initialized: 200/200 |
107 | 113 | >>> z.append(np.vstack([a, a]), axis=1)
|
108 | 114 | >>> z
|
109 |
| - zarr.ext.Array((20000, 2000), int32, chunks=(1000, 100), cname='blosclz', clevel=5, shuffle=1) |
110 |
| - nbytes: 152.6M; cbytes: 7.6M; ratio: 20.2 |
| 115 | + zarr.ext.SynchronizedArray((20000, 2000), int32, chunks=(1000, 100)) |
| 116 | + cname: 'blosclz'; clevel: 5; shuffle: 1 (BYTESHUFFLE) |
| 117 | + nbytes: 152.6M; cbytes: 7.6M; ratio: 20.2; initialized: 400/400 |
111 | 118 |
|
112 | 119 | Tuning
|
113 | 120 | ------
|
114 | 121 |
|
115 |
| -``zarr`` is designed for use in parallel computations working chunk-wise |
| 122 | +``zarr`` is designed for use in parallel computations working chunk-wise |
116 | 123 | over data. Try it with `dask.array
|
117 | 124 | <http://dask.pydata.org/en/latest/array.html>`_.
|
118 | 125 |
|
119 |
| -``zarr`` is optimised for accessing and storing data in contiguous slices, |
120 |
| -of the same size or larger than chunks. It is not and will never be |
121 |
| -optimised for single item access. |
| 126 | +``zarr`` is optimised for accessing and storing data in contiguous slices, |
| 127 | +of the same size or larger than chunks. It is not and will never be |
| 128 | +optimised for single item access. |
122 | 129 |
|
123 |
| -Chunks sizes >= 1M are generally good. Optimal chunk shape will depend on |
| 130 | +Chunks sizes >= 1M are generally good. Optimal chunk shape will depend on |
124 | 131 | the correlation structure in your data.
|
125 | 132 |
|
126 | 133 | Acknowledgments
|
127 | 134 | ---------------
|
128 | 135 |
|
129 | 136 | ``zarr`` uses `c-blosc <https://github.com/Blosc/c-blosc>`_ internally for
|
130 |
| -compression and decompression and borrows code heavily from |
| 137 | +compression and decompression and borrows code heavily from |
131 | 138 | `bcolz <http://bcolz.blosc.org/>`_.
|
0 commit comments