1- user-guide-performance
1+ .. only :: doctest
2+
3+ >>> import shutil
4+ >>> shutil.rmtree(' data' , ignore_errors = True )
5+
6+ .. _user-guide-performance :
27
38Optimizing performance
49======================
@@ -19,42 +24,41 @@ better performance, at least when using the Blosc compression library.
1924The optimal chunk shape will depend on how you want to access the data. E.g.,
2025for a 2-dimensional array, if you only ever take slices along the first
2126dimension, then chunk across the second dimension. If you know you want to chunk
22- across an entire dimension you can use `` None `` or `` -1 `` within the `` chunks ``
23- argument, e.g.::
27+ across an entire dimension you can use the full size of that dimension within the
28+ `` chunks `` argument, e.g.::
2429
2530 >>> import zarr
26- >>>
27- >>> z1 = zarr.zeros((10000, 10000), chunks=(100, None), dtype='i4')
31+ >>> z1 = zarr.create_array(store={}, shape=(10000, 10000), chunks=(100, 10000), dtype='int32')
2832 >>> z1.chunks
2933 (100, 10000)
3034
3135Alternatively, if you only ever take slices along the second dimension, then
3236chunk across the first dimension, e.g.::
3337
34- >>> z2 = zarr.zeros( (10000, 10000), chunks=(None , 100), dtype='i4 ')
38+ >>> z2 = zarr.create_array(store={}, shape= (10000, 10000), chunks=(10000 , 100), dtype='int32 ')
3539 >>> z2.chunks
3640 (10000, 100)
3741
3842If you require reasonable performance for both access patterns then you need to
3943find a compromise, e.g.::
4044
41- >>> z3 = zarr.zeros( (10000, 10000), chunks=(1000, 1000), dtype='i4 ')
45+ >>> z3 = zarr.create_array(store={}, shape= (10000, 10000), chunks=(1000, 1000), dtype='int32 ')
4246 >>> z3.chunks
4347 (1000, 1000)
4448
4549If you are feeling lazy, you can let Zarr guess a chunk shape for your data by
46- providing ``chunks=True ``, although please note that the algorithm for guessing
50+ providing ``chunks='auto' ``, although please note that the algorithm for guessing
4751a chunk shape is based on simple heuristics and may be far from optimal. E.g.::
4852
49- >>> z4 = zarr.zeros( (10000, 10000), chunks=True , dtype='i4 ')
53+ >>> z4 = zarr.create_array(store={}, shape= (10000, 10000), chunks='auto' , dtype='int32 ')
5054 >>> z4.chunks
5155 (625, 625)
5256
5357If you know you are always going to be loading the entire array into memory, you
54- can turn off chunks by providing ``chunks=False `` , in which case there will be
55- one single chunk for the array::
58+ can turn off chunks by providing ``chunks `` equal to `` shape `` , in which case there
59+ will be one single chunk for the array::
5660
57- >>> z5 = zarr.zeros( (10000, 10000), chunks=False, dtype='i4 ')
61+ >>> z5 = zarr.create_array(store={}, shape= (10000, 10000), chunks=(10000, 10000), dtype='int32 ')
5862 >>> z5.chunks
5963 (10000, 10000)
6064
@@ -70,9 +74,9 @@ ratios, depending on the correlation structure within the data. E.g.::
7074
7175 >>> import numpy as np
7276 >>>
73- >>> a = np.arange(100000000, dtype='i4 ').reshape(10000, 10000).T
74- >>> # TODO: replace with create_array after #2463
75- >>> c = zarr.array(a, chunks=(1000, 1000))
77+ >>> a = np.arange(100000000, dtype='int32 ').reshape(10000, 10000).T
78+ >>> c = zarr.create_array(store={}, shape=a.shape, chunks=(1000, 1000), dtype=a.dtype, config={'order': 'C'})
79+ >>> c[:] = a
7680 >>> c.info_complete()
7781 Type : Array
7882 Zarr format : 3
@@ -88,7 +92,8 @@ ratios, depending on the correlation structure within the data. E.g.::
8892 Storage ratio : 1.2
8993 Chunks Initialized : 100
9094 >>> with zarr.config.set({'array.order': 'F'}):
91- ... f = zarr.array(a, chunks=(1000, 1000))
95+ ... f = zarr.create_array(store={}, shape=a.shape, chunks=(1000, 1000), dtype=a.dtype)
96+ ... f[:] = a
9297 >>> f.info_complete()
9398 Type : Array
9499 Zarr format : 3
@@ -143,15 +148,14 @@ the time required to write an array with different values.::
143148 ... shape = (chunks[0] * 1024,)
144149 ... data = np.random.randint(0, 255, shape)
145150 ... dtype = 'uint8'
146- ... with zarr.config.set({"array.write_empty_chunks": write_empty_chunks}):
147- ... arr = zarr.open(
148- ... f"data/example-{write_empty_chunks}.zarr",
149- ... shape=shape,
150- ... chunks=chunks,
151- ... dtype=dtype,
152- ... fill_value=0,
153- ... mode='w'
154- ... )
151+ ... arr = zarr.create_array(
152+ ... f'data/example-{write_empty_chunks}.zarr',
153+ ... shape=shape,
154+ ... chunks=chunks,
155+ ... dtype=dtype,
156+ ... fill_value=0,
157+ ... config={'write_empty_chunks': write_empty_chunks}
158+ ... )
155159 ... # initialize all chunks
156160 ... arr[:] = 100
157161 ... result = []
@@ -208,9 +212,9 @@ to re-open any underlying files or databases upon being unpickled.
208212E.g., pickle/unpickle an local store array::
209213
210214 >>> import pickle
211- >>>
212- >>> # TODO: replace with create_array after #2463
213- >>> z1 = zarr.array(store=" data/example-2", data=np.arange(100000))
215+ >>> data = np.arange(100000)
216+ >>> z1 = zarr. create_array(store='data/example-2.zarr', shape=data.shape, chunks=data.shape, dtype=data.dtype)
217+ >>> z1[:] = data
214218 >>> s = pickle.dumps(z1)
215219 >>> z2 = pickle.loads(s)
216220 >>> z1 == z2
0 commit comments