11.. _user-guide-arrays :
22
3+
4+ .. only :: doctest
5+ >>> import shutil
6+ >>> shutil.rmtree(' data' , ignore_errors = True )
7+
38Working with arrays
49===================
510
@@ -10,9 +15,8 @@ Zarr has several functions for creating arrays. For example::
1015
1116 >>> import zarr
1217 >>>
13- >>> store = {}
14- >>> # TODO: replace with `create_array` after #2463
15- >>> z = zarr.create(store=store, mode="w", shape=(10000, 10000), chunks=(1000, 1000), dtype="i4")
18+ >>> store = zarr.storage.MemoryStore()
19+ >>> z = zarr.create_array(store=store, shape=(10000, 10000), chunks=(1000, 1000), dtype='int32')
1620 >>> z
1721 <Array memory://... shape=(10000, 10000) dtype=int32>
1822
@@ -79,16 +83,14 @@ main memory. Zarr arrays can also be stored on a file system, enabling
7983persistence of data between sessions. To do this, we can change the store
8084argument to point to a filesystem path::
8185
82- >>> # TODO: replace with `open_array` after #2463
83- >>> z1 = zarr.open(store='data/example-2.zarr', mode='w', shape=(10000, 10000), chunks=(1000, 1000), dtype='i4')
86+ >>> z1 = zarr.create_array(store='data/example-1.zarr', shape=(10000, 10000), chunks=(1000, 1000), dtype='int32')
8487
8588The array above will store its configuration metadata and all compressed chunk
86- data in a directory called ``'data/example-2 .zarr' `` relative to the current working
87- directory. The :func: `zarr.open ` function provides a convenient way
89+ data in a directory called ``'data/example-1 .zarr' `` relative to the current working
90+ directory. The :func: `zarr.create_array ` function provides a convenient way
8891to create a new persistent array or continue working with an existing
89- array. Note that although the function is called "open", there is no need to
90- close an array: data are automatically flushed to disk, and files are
91- automatically closed whenever an array is modified.
92+ array. Note, there is no need to close an array: data are automatically
93+ flushed to disk, and files are automatically closed whenever an array is modified.
9294
9395Persistent arrays support the same interface for reading and writing data,
9496e.g.::
@@ -99,8 +101,7 @@ e.g.::
99101
100102Check that the data have been written and can be read again::
101103
102- >>> # TODO: replace with `open_array` after #2463
103- >>> z2 = zarr.open('data/example-2.zarr', mode='r')
104+ >>> z2 = zarr.open_array('data/example-1.zarr', mode='r')
104105 >>> np.all(z1[:] == z2[:])
105106 np.True_
106107
@@ -110,8 +111,8 @@ disk then load back into memory later, the functions
110111useful. E.g.::
111112
112113 >>> a = np.arange(10)
113- >>> zarr.save('data/example-3 .zarr', a)
114- >>> zarr.load('data/example-3 .zarr')
114+ >>> zarr.save('data/example-2 .zarr', a)
115+ >>> zarr.load('data/example-2 .zarr')
115116 array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
116117
117118Please note that there are a number of other options for persistent array
@@ -125,7 +126,7 @@ Resizing and appending
125126A Zarr array can be resized, which means that any of its dimensions can be
126127increased or decreased in length. For example::
127128
128- >>> z = zarr.zeros (store=" data/example-4 .zarr" , shape=(10000, 10000), chunks=(1000, 1000))
129+ >>> z = zarr.create_array (store=' data/example-3 .zarr' , shape=(10000, 10000), dtype='int32', chunks=(1000, 1000))
129130 >>> z[:] = 42
130131 >>> z.shape
131132 (10000, 10000)
@@ -140,9 +141,9 @@ new array shape will be deleted from the underlying store.
140141:func: `zarr.Array.append ` is provided as a convenience function, which can be
141142used to append data to any axis. E.g.::
142143
143- >>> a = np.arange(10000000, dtype='i4 ').reshape(10000, 1000)
144- >>> # TODO: replace with create_array after #2463
145- >>> z = zarr.array(store="data/example-5", data=a, chunks=(1000, 100))
144+ >>> a = np.arange(10000000, dtype='int32 ').reshape(10000, 1000)
145+ >>> z = zarr.create_array(store='data/example-4.zarr', shape=a.shape, dtype=a.dtype, chunks=(1000, 100))
146+ >>> z[:] = a
146147 >>> z.shape
147148 (10000, 1000)
148149 >>> z.append(a)
@@ -157,19 +158,19 @@ used to append data to any axis. E.g.::
157158Compressors
158159-----------
159160
160- A number of different compressors can be used with Zarr. A separate package
161- called NumCodecs _ is available which provides a common interface to various
162- compressor libraries including Blosc, Zstandard, LZ4, Zlib, BZ2 and
163- LZMA. Different compressors can be provided via the ``compressor `` keyword
161+ A number of different compressors can be used with Zarr. Zarr includes Blosc,
162+ Zstandard and Gzip compressors. Additional compressors are available through
163+ a separate package called NumCodecs _ which provides various
164+ compressor libraries including LZ4, Zlib, BZ2 and LZMA.
165+ Different compressors can be provided via the ``compressors `` keyword
164166argument accepted by all array creation functions. For example::
165167
166- >>> from numcodecs import Blosc
167- >>>
168- >>> compressor = None # TODO: Blosc(cname='zstd', clevel=3, shuffle=Blosc.BITSHUFFLE)
169- >>> data = np.arange(100000000, dtype='i4').reshape(10000, 10000)
170- >>> # TODO: remove zarr_format and replace with create_array after #2463
171- >>> z = zarr.array(store="data/example-6.zarr", data=data, chunks=(1000, 1000), compressor=compressor, zarr_format=2)
172- >>> None # TODO: z.compressor
168+ >>> compressors = zarr.codecs.BloscCodec(cname='zstd', clevel=3, shuffle=zarr.codecs.BloscShuffle.bitshuffle)
169+ >>> data = np.arange(100000000, dtype='int32').reshape(10000, 10000)
170+ >>> z = zarr.create_array(store='data/example-5.zarr', shape=data.shape, dtype=data.dtype, chunks=(1000, 1000), compressors=compressors)
171+ >>> z[:] = data
172+ >>> z.metadata.codecs
173+ [BytesCodec(endian=<Endian.little: 'little'>), BloscCodec(typesize=4, cname=<BloscCname.zstd: 'zstd'>, clevel=3, shuffle=<BloscShuffle.bitshuffle: 'bitshuffle'>, blocksize=0)]
173174
174175This array above will use Blosc as the primary compressor, using the Zstandard
175176algorithm (compression level 3) internally within Blosc, and with the
@@ -181,87 +182,78 @@ which can be used to print useful diagnostics, e.g.::
181182
182183 >>> z.info
183184 Type : Array
184- Zarr format : 2
185- Data type : int32
185+ Zarr format : 3
186+ Data type : DataType. int32
186187 Shape : (10000, 10000)
187188 Chunk shape : (1000, 1000)
188189 Order : C
189190 Read-only : False
190191 Store type : LocalStore
191- Compressor : Zstd(level=0)
192+ Codecs : [{'endian': <Endian.little: 'little'>}, {'typesize': 4, 'cname': <BloscCname.zstd: 'zstd'>, 'clevel': 3, 'shuffle': <BloscShuffle.bitshuffle: 'bitshuffle'>, 'blocksize': 0}]
192193 No. bytes : 400000000 (381.5M)
193194
194195The :func: `zarr.Array.info_complete ` method inspects the underlying store and
195196prints additional diagnostics, e.g.::
196197
197198 >>> z.info_complete()
198199 Type : Array
199- Zarr format : 2
200- Data type : int32
200+ Zarr format : 3
201+ Data type : DataType. int32
201202 Shape : (10000, 10000)
202203 Chunk shape : (1000, 1000)
203204 Order : C
204205 Read-only : False
205206 Store type : LocalStore
206- Compressor : Zstd(level=0)
207+ Codecs : [{'endian': <Endian.little: 'little'>}, {'typesize': 4, 'cname': <BloscCname.zstd: 'zstd'>, 'clevel': 3, 'shuffle': <BloscShuffle.bitshuffle: 'bitshuffle'>, 'blocksize': 0}]
207208 No. bytes : 400000000 (381.5M)
208- No. bytes stored : 299348444
209- Storage ratio : 1 .3
209+ No. bytes stored : 9696302
210+ Storage ratio : 41 .3
210211 Chunks Initialized : 100
211212
212213.. note ::
213214 :func: `zarr.Array.info_complete ` will inspect the underlying store and may
214215 be slow for large arrays. Use :attr: `zarr.Array.info ` if detailed storage
215216 statistics are not needed.
216217
217- If you don't specify a compressor, by default Zarr uses the Blosc
218- compressor. Blosc is generally very fast and can be configured in a variety of
219- ways to improve the compression ratio for different types of data. Blosc is in
220- fact a "meta-compressor", which means that it can use a number of different
221- compression algorithms internally to compress the data. Blosc also provides
222- highly optimized implementations of byte- and bit-shuffle filters, which can
223- improve compression ratios for some data. A list of the internal compression
224- libraries available within Blosc can be obtained via::
218+ If you don't specify a compressor, by default Zarr uses the Zstandard
219+ compressor.
225220
226- >>> from numcodecs import blosc
227- >>>
228- >>> blosc.list_compressors()
229- ['blosclz', 'lz4', 'lz4hc', 'zlib', 'zstd']
221+ In addition to Blosc and Zstandard, other compression libraries can also be used. For example,
222+ here is an array using Gzip compression, level 1::
230223
231- In addition to Blosc, other compression libraries can also be used. For example,
232- here is an array using Zstandard compression, level 1::
224+ >>> data = np.arange(100000000, dtype='int32').reshape(10000, 10000)
225+ >>> z = zarr.create_array(store='data/example-6.zarr', shape=data.shape, dtype=data.dtype, chunks=(1000, 1000), compressors=zarr.codecs.GzipCodec(level=1))
226+ >>> z[:] = data
227+ >>> z.metadata.codecs
228+ [BytesCodec(endian=<Endian.little: 'little'>), GzipCodec(level=1)]
233229
234- >>> from numcodecs import Zstd
235- >>> # TODO: remove zarr_format and replace with create_array after #2463
236- >>> z = zarr.array(store="data/example-7.zarr", data=np.arange(100000000, dtype='i4').reshape(10000, 10000), chunks=(1000, 1000), compressor=Zstd(level=1), zarr_format=2)
237- >>> None # TODO: z.compressor
238-
239- Here is an example using LZMA with a custom filter pipeline including LZMA's
230+ Here is an example using LZMA from NumCodecs _ with a custom filter pipeline including LZMA's
240231built-in delta filter::
241232
242233 >>> import lzma
243- >>> from numcodecs import LZMA
234+ >>> from numcodecs.zarr3 import LZMA
244235 >>>
245236 >>> lzma_filters = [dict(id=lzma.FILTER_DELTA, dist=4), dict(id=lzma.FILTER_LZMA2, preset=1)]
246- >>> compressor = LZMA(filters=lzma_filters)
247- >>> # TODO: remove zarr_format and replace with create_array after #2463
248- >>> z = zarr.array(np.arange(100000000, dtype='i4').reshape(10000, 10000), chunks=(1000, 1000), compressor=compressor, zarr_format=2)
249- >>> None # TODO: z.compressor
237+ >>> compressors = LZMA(filters=lzma_filters)
238+ >>> data = np.arange(100000000, dtype='int32').reshape(10000, 10000)
239+ >>> z = zarr.create_array(store='data/example-7.zarr', shape=data.shape, dtype=data.dtype, chunks=(1000, 1000), compressors=compressors)
240+ >>> z.metadata.codecs
241+ [BytesCodec(endian=<Endian.little: 'little'>), _make_bytes_bytes_codec.<locals>._Codec(codec_name='numcodecs.lzma', codec_config={'id': 'lzma', 'filters': [{'id': 3, 'dist': 4}, {'id': 33, 'preset': 1}]})]
250242
251243The default compressor can be changed by setting the value of the using Zarr's
252244:ref: `user-guide-config `, e.g.::
253245
254- >>> with zarr.config.set({'array.v2_default_compressor.numeric': ' blosc'}):
246+ >>> with zarr.config.set({'array.v2_default_compressor.numeric': {'id': ' blosc'} }):
255247 ... z = zarr.zeros(100000000, chunks=1000000, zarr_format=2)
256248 >>> z.metadata.filters
257249 >>> z.metadata.compressor
258- LZMA(format=1, check=-1, preset=None, filters=[{'id': 3, 'dist': 4}, {'id': 33, 'preset': 1}])
259- >>>
250+ Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
260251
261- To disable compression, set ``compressor =None `` when creating an array, e.g.::
252+ To disable compression, set ``compressors =None `` when creating an array, e.g.::
262253
263- >>> # TODO: remove zarr_format
264- >>> z = zarr.zeros(100000000, chunks=1000000, compressor=None, zarr_format=2)
254+ >>> z = zarr.create_array(store='data/example-8.zarr', shape=(100000000,), chunks=(1000000,), dtype='int32', compressors=None)
255+ >>> z.metadata.codecs
256+ [BytesCodec(endian=<Endian.little: 'little'>)]
265257
266258.. _user-guide-filters :
267259
@@ -281,24 +273,22 @@ mechanism for configuring filters outside of the primary compressor.
281273
282274Here is an example using a delta filter with the Blosc compressor::
283275
284- >>> from numcodecs import Blosc, Delta
276+ >>> from numcodecs.zarr3 import Delta
285277 >>>
286- >>> filters = [Delta(dtype='i4')]
287- >>> compressor = Blosc(cname='zstd', clevel=1, shuffle=Blosc.SHUFFLE)
288- >>> data = np.arange(100000000, dtype='i4').reshape(10000, 10000)
289- >>> # TODO: remove zarr_format and replace with create_array after #2463
290- >>> z = zarr.array(data, chunks=(1000, 1000), filters=filters, compressor=compressor, zarr_format=2)
278+ >>> filters = [Delta(dtype='int32')]
279+ >>> compressors = zarr.codecs.BloscCodec(cname='zstd', clevel=1, shuffle=zarr.codecs.BloscShuffle.shuffle)
280+ >>> data = np.arange(100000000, dtype='int32').reshape(10000, 10000)
281+ >>> z = zarr.create_array(store='data/example-9.zarr', shape=data.shape, dtype=data.dtype, chunks=(1000, 1000), filters=filters, compressors=compressors)
291282 >>> z.info
292283 Type : Array
293- Zarr format : 2
294- Data type : int32
284+ Zarr format : 3
285+ Data type : DataType. int32
295286 Shape : (10000, 10000)
296287 Chunk shape : (1000, 1000)
297288 Order : C
298289 Read-only : False
299- Store type : MemoryStore
300- Compressor : Blosc(cname='zstd', clevel=1, shuffle=SHUFFLE, blocksize=0)
301- Filters : (Delta(dtype='<i4'),)
290+ Store type : LocalStore
291+ Codecs : [{'codec_name': 'numcodecs.delta', 'codec_config': {'id': 'delta', 'dtype': 'int32'}}, {'endian': <Endian.little: 'little'>}, {'typesize': 4, 'cname': <BloscCname.zstd: 'zstd'>, 'clevel': 1, 'shuffle': <BloscShuffle.shuffle: 'shuffle'>, 'blocksize': 0}]
302292 No. bytes : 400000000 (381.5M)
303293
304294For more information about available filter codecs, see the `Numcodecs
@@ -325,8 +315,9 @@ Indexing with coordinate arrays
325315Items from a Zarr array can be extracted by providing an integer array of
326316coordinates. E.g.::
327317
328- >>> # TODO: replace with create_array after #2463
329- >>> z = zarr.array(np.arange(10) ** 2)
318+ >>> data = np.arange(10) ** 2
319+ >>> z = zarr.create_array(store='data/example-10.zarr', shape=data.shape, dtype=data.dtype)
320+ >>> z[:] = data
330321 >>> z[:]
331322 array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
332323 >>> z.get_coordinate_selection([2, 5])
@@ -341,8 +332,9 @@ Coordinate arrays can also be used to update data, e.g.::
341332For multidimensional arrays, coordinates must be provided for each dimension,
342333e.g.::
343334
344- >>> # TODO: replace with create_array after #2463
345- >>> z = zarr.array(np.arange(15).reshape(3, 5))
335+ >>> data = np.arange(15).reshape(3, 5)
336+ >>> z = zarr.create_array(store='data/example-11.zarr', shape=data.shape, dtype=data.dtype)
337+ >>> z[:] = data
346338 >>> z[:]
347339 array([[ 0, 1, 2, 3, 4],
348340 [ 5, 6, 7, 8, 9],
@@ -381,8 +373,9 @@ Indexing with a mask array
381373
382374Items can also be extracted by providing a Boolean mask. E.g.::
383375
384- >>> # TODO: replace with create_array after #2463
385- >>> z = zarr.array(np.arange(10) ** 2)
376+ >>> data = np.arange(10) ** 2
377+ >>> z = zarr.create_array(store='data/example-12.zarr', shape=data.shape, dtype=data.dtype)
378+ >>> z[:] = data
386379 >>> z[:]
387380 array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
388381 >>> sel = np.zeros_like(z, dtype=bool)
@@ -396,8 +389,9 @@ Items can also be extracted by providing a Boolean mask. E.g.::
396389
397390Here's a multidimensional example::
398391
399- >>> # TODO: replace with create_array after #2463
400- >>> z = zarr.array(np.arange(15).reshape(3, 5))
392+ >>> data = np.arange(15).reshape(3, 5)
393+ >>> z = zarr.create_array(store='data/example-13.zarr', shape=data.shape, dtype=data.dtype)
394+ >>> z[:] = data
401395 >>> z[:]
402396 array([[ 0, 1, 2, 3, 4],
403397 [ 5, 6, 7, 8, 9],
@@ -436,8 +430,9 @@ selections to be made along each dimension of an array independently. For
436430example, this allows selecting a subset of rows and/or columns from a
4374312-dimensional array. E.g.::
438432
439- >>> # TODO: replace with create_array after #2463
440- >>> z = zarr.array(np.arange(15).reshape(3, 5))
433+ >>> data = np.arange(15).reshape(3, 5)
434+ >>> z = zarr.create_array(store='data/example-14.zarr', shape=data.shape, dtype=data.dtype)
435+ >>> z[:] = data
441436 >>> z[:]
442437 array([[ 0, 1, 2, 3, 4],
443438 [ 5, 6, 7, 8, 9],
@@ -460,8 +455,9 @@ Data can also be modified, e.g.::
460455For convenience, the orthogonal indexing functionality is also available via the
461456``oindex `` property, e.g.::
462457
463- >>> # TODO: replace with create_array after #2463
464- >>> z = zarr.array(np.arange(15).reshape(3, 5))
458+ >>> data = np.arange(15).reshape(3, 5)
459+ >>> z = zarr.create_array(store='data/example-15.zarr', shape=data.shape, dtype=data.dtype)
460+ >>> z[:] = data
465461 >>> z.oindex[[0, 2], :] # select first and third rows
466462 array([[ 0, 1, 2, 3, 4],
467463 [10, 11, 12, 13, 14]])
@@ -484,8 +480,9 @@ be used for orthogonal indexing.
484480If the index contains at most one iterable, and otherwise contains only slices and integers,
485481orthogonal indexing is also available directly on the array::
486482
487- >>> # TODO: replace with create_array after #2463
488- >>> z = zarr.array(np.arange(15).reshape(3, 5))
483+ >>> data = np.arange(15).reshape(3, 5)
484+ >>> z = zarr.create_array(store='data/example-16.zarr', shape=data.shape, dtype=data.dtype)
485+ >>> z[:] = data
489486 >>> np.all(z.oindex[[0, 2], :] == z[[0, 2], :])
490487 np.True_
491488
@@ -496,8 +493,9 @@ Zarr also support block indexing, which allows selections of whole chunks based
496493logical indices along each dimension of an array. For example, this allows selecting
497494a subset of chunk aligned rows and/or columns from a 2-dimensional array. E.g.::
498495
499- >>> # TODO: replace with create_array after #2463
500- >>> z = zarr.array(np.arange(100).reshape(10, 10), chunks=(3, 3))
496+ >>> data = np.arange(100).reshape(10, 10)
497+ >>> z = zarr.create_array(store='data/example-17.zarr', shape=data.shape, dtype=data.dtype, chunks=(3, 3))
498+ >>> z[:] = data
501499
502500Retrieve items by specifying their block coordinates::
503501
@@ -531,7 +529,7 @@ For example::
531529
532530Data can also be modified. Let's start by a simple 2D array::
533531
534- >>> z = zarr.zeros( (6, 6), dtype=int, chunks=2 )
532+ >>> z = zarr.create_array(store='data/example-18.zarr', shape= (6, 6), dtype=int, chunks=(2, 2) )
535533
536534Set data for a selection of items::
537535
@@ -562,10 +560,9 @@ Any combination of integer and slice can be used for block indexing::
562560 array([[0, 0, 7, 7],
563561 [0, 0, 7, 7]])
564562 >>>
565- >>> # TODO: replace with create_group after #2463
566- >>> root = zarr.group('data/example-12.zarr')
567- >>> foo = root.create_array(name='foo', shape=(1000, 100), chunks=(10, 10), dtype='f4')
568- >>> bar = root.create_array(name='foo/bar', shape=(100,), dtype='i4')
563+ >>> root = zarr.create_group('data/example-19.zarr')
564+ >>> foo = root.create_array(name='foo', shape=(1000, 100), chunks=(10, 10), dtype='float32')
565+ >>> bar = root.create_array(name='foo/bar', shape=(100,), dtype='int32')
569566 >>> foo[:, :] = np.random.random((1000, 100))
570567 >>> bar[:] = np.arange(100)
571568 >>> root.tree()
0 commit comments