@@ -136,13 +136,13 @@ Create a persistent array (data stored on disk)
136
136
.. code-block :: python
137
137
138
138
>> > path = ' example.zarr'
139
- >> > z = zarr.open(path, shape = (10000 , 1000 ), dtype = ' i4' , chunks = (1000 , 100 ))
139
+ >> > z = zarr.open(path, mode = ' w ' , shape = (10000 , 1000 ), dtype = ' i4' , chunks = (1000 , 100 ))
140
140
>> > z[:] = np.arange(10000000 , dtype = ' i4' ).reshape(10000 , 1000 )
141
141
>> > z
142
142
zarr.ext.SynchronizedPersistentArray((10000 , 1000 ), int32, chunks = (1000 , 100 ))
143
143
cname: blosclz; clevel: 5 ; shuffle: 1 (BYTESHUFFLE )
144
144
nbytes: 38. 1M ; cbytes: 2. 0M ; ratio: 19.3 ; initialized: 100 / 100
145
- mode: a ; path: example.zarr
145
+ mode: w ; path: example.zarr
146
146
147
147
There is no need to close a persistent array. Data are automatically flushed
148
148
to disk.
@@ -152,30 +152,39 @@ If you're working with really big arrays, try the 'lazy' option
152
152
.. code-block :: python
153
153
154
154
>> > path = ' big.zarr'
155
- >> > z = zarr.open(path, shape = (1e8 , 1e7 ), dtype = ' i4' , chunks = (1000 , 1000 ), lazy = True )
155
+ >> > z = zarr.open(path, mode = ' w ' , shape = (1e8 , 1e7 ), dtype = ' i4' , chunks = (1000 , 1000 ), lazy = True )
156
156
>> > z
157
157
zarr.ext.SynchronizedLazyPersistentArray((100000000 , 10000000 ), int32, chunks = (1000 , 1000 ))
158
158
cname: blosclz; clevel: 5 ; shuffle: 1 (BYTESHUFFLE )
159
159
nbytes: 3. 6P ; cbytes: 0 ; initialized: 0 / 1000000000
160
- mode: a ; path: big.zarr
160
+ mode: w ; path: big.zarr
161
161
162
162
See the [persistence documentation](PERSISTENCE.rst) for more details of the
163
163
file format.
164
164
165
165
Tuning
166
166
------
167
167
168
- ``zarr `` is designed for use in parallel computations working chunk-wise
169
- over data. Try it with `dask.array
170
- <http://dask.pydata.org/en/latest/array.html> `_.
171
-
172
168
``zarr `` is optimised for accessing and storing data in contiguous slices,
173
169
of the same size or larger than chunks. It is not and will never be
174
170
optimised for single item access.
175
171
176
172
Chunks sizes >= 1M are generally good. Optimal chunk shape will depend on
177
173
the correlation structure in your data.
178
174
175
+ ``zarr `` is designed for use in parallel computations working
176
+ chunk-wise over data. Try it with `dask.array
177
+ <http://dask.pydata.org/en/latest/array.html> `_. If using in a
178
+ multi-threaded, set zarr to use blosc in contextual mode::
179
+
180
+ >>> zarr.set_blosc_options(use_context=True)
181
+
182
+ If using zarr in a single-threaded context, set zarr to use blosc in
183
+ non-contextual mode, which allows blosc to use multiple threads
184
+ internally::
185
+
186
+ >>> zarr.set_blosc_options(use_context=False, nthreads=4)
187
+
179
188
Acknowledgments
180
189
---------------
181
190
0 commit comments