Skip to content

Commit 5b5e757

Browse files
committed
fix doctests for arrays.rst
1 parent 2dab20e commit 5b5e757

File tree

2 files changed

+98
-100
lines changed

2 files changed

+98
-100
lines changed

docs/user-guide/arrays.rst

Lines changed: 97 additions & 100 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
.. _user-guide-arrays:
22

3+
4+
.. only:: doctest
5+
>>> import shutil
6+
>>> shutil.rmtree('data', ignore_errors=True)
7+
38
Working with arrays
49
===================
510

@@ -10,9 +15,8 @@ Zarr has several functions for creating arrays. For example::
1015

1116
>>> import zarr
1217
>>>
13-
>>> store = {}
14-
>>> # TODO: replace with `create_array` after #2463
15-
>>> z = zarr.create(store=store, mode="w", shape=(10000, 10000), chunks=(1000, 1000), dtype="i4")
18+
>>> store = zarr.storage.MemoryStore()
19+
>>> z = zarr.create_array(store=store, shape=(10000, 10000), chunks=(1000, 1000), dtype='int32')
1620
>>> z
1721
<Array memory://... shape=(10000, 10000) dtype=int32>
1822

@@ -79,16 +83,14 @@ main memory. Zarr arrays can also be stored on a file system, enabling
7983
persistence of data between sessions. To do this, we can change the store
8084
argument to point to a filesystem path::
8185

82-
>>> # TODO: replace with `open_array` after #2463
83-
>>> z1 = zarr.open(store='data/example-2.zarr', mode='w', shape=(10000, 10000), chunks=(1000, 1000), dtype='i4')
86+
>>> z1 = zarr.create_array(store='data/example-1.zarr', shape=(10000, 10000), chunks=(1000, 1000), dtype='int32')
8487

8588
The array above will store its configuration metadata and all compressed chunk
86-
data in a directory called ``'data/example-2.zarr'`` relative to the current working
87-
directory. The :func:`zarr.open` function provides a convenient way
89+
data in a directory called ``'data/example-1.zarr'`` relative to the current working
90+
directory. The :func:`zarr.create_array` function provides a convenient way
8891
to create a new persistent array or continue working with an existing
89-
array. Note that although the function is called "open", there is no need to
90-
close an array: data are automatically flushed to disk, and files are
91-
automatically closed whenever an array is modified.
92+
array. Note, there is no need to close an array: data are automatically
93+
flushed to disk, and files are automatically closed whenever an array is modified.
9294

9395
Persistent arrays support the same interface for reading and writing data,
9496
e.g.::
@@ -99,8 +101,7 @@ e.g.::
99101

100102
Check that the data have been written and can be read again::
101103

102-
>>> # TODO: replace with `open_array` after #2463
103-
>>> z2 = zarr.open('data/example-2.zarr', mode='r')
104+
>>> z2 = zarr.open_array('data/example-1.zarr', mode='r')
104105
>>> np.all(z1[:] == z2[:])
105106
np.True_
106107

@@ -110,8 +111,8 @@ disk then load back into memory later, the functions
110111
useful. E.g.::
111112

112113
>>> a = np.arange(10)
113-
>>> zarr.save('data/example-3.zarr', a)
114-
>>> zarr.load('data/example-3.zarr')
114+
>>> zarr.save('data/example-2.zarr', a)
115+
>>> zarr.load('data/example-2.zarr')
115116
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
116117

117118
Please note that there are a number of other options for persistent array
@@ -125,7 +126,7 @@ Resizing and appending
125126
A Zarr array can be resized, which means that any of its dimensions can be
126127
increased or decreased in length. For example::
127128

128-
>>> z = zarr.zeros(store="data/example-4.zarr", shape=(10000, 10000), chunks=(1000, 1000))
129+
>>> z = zarr.create_array(store='data/example-3.zarr', shape=(10000, 10000), dtype='int32',chunks=(1000, 1000))
129130
>>> z[:] = 42
130131
>>> z.shape
131132
(10000, 10000)
@@ -140,9 +141,9 @@ new array shape will be deleted from the underlying store.
140141
:func:`zarr.Array.append` is provided as a convenience function, which can be
141142
used to append data to any axis. E.g.::
142143

143-
>>> a = np.arange(10000000, dtype='i4').reshape(10000, 1000)
144-
>>> # TODO: replace with create_array after #2463
145-
>>> z = zarr.array(store="data/example-5", data=a, chunks=(1000, 100))
144+
>>> a = np.arange(10000000, dtype='int32').reshape(10000, 1000)
145+
>>> z = zarr.create_array(store='data/example-4.zarr', shape=a.shape, dtype=a.dtype, chunks=(1000, 100))
146+
>>> z[:] = a
146147
>>> z.shape
147148
(10000, 1000)
148149
>>> z.append(a)
@@ -157,19 +158,19 @@ used to append data to any axis. E.g.::
157158
Compressors
158159
-----------
159160

160-
A number of different compressors can be used with Zarr. A separate package
161-
called NumCodecs_ is available which provides a common interface to various
162-
compressor libraries including Blosc, Zstandard, LZ4, Zlib, BZ2 and
163-
LZMA. Different compressors can be provided via the ``compressor`` keyword
161+
A number of different compressors can be used with Zarr. Zarr includes Blosc,
162+
Zstandard and Gzip compressors. Additional compressors are available through
163+
a separate package called NumCodecs_ which provides various
164+
compressor libraries including LZ4, Zlib, BZ2 and LZMA.
165+
Different compressors can be provided via the ``compressors`` keyword
164166
argument accepted by all array creation functions. For example::
165167

166-
>>> from numcodecs import Blosc
167-
>>>
168-
>>> compressor = None # TODO: Blosc(cname='zstd', clevel=3, shuffle=Blosc.BITSHUFFLE)
169-
>>> data = np.arange(100000000, dtype='i4').reshape(10000, 10000)
170-
>>> # TODO: remove zarr_format and replace with create_array after #2463
171-
>>> z = zarr.array(store="data/example-6.zarr", data=data, chunks=(1000, 1000), compressor=compressor, zarr_format=2)
172-
>>> None # TODO: z.compressor
168+
>>> compressors = zarr.codecs.BloscCodec(cname='zstd', clevel=3, shuffle=zarr.codecs.BloscShuffle.bitshuffle)
169+
>>> data = np.arange(100000000, dtype='int32').reshape(10000, 10000)
170+
>>> z = zarr.create_array(store='data/example-5.zarr', shape=data.shape, dtype=data.dtype, chunks=(1000, 1000), compressors=compressors)
171+
>>> z[:] = data
172+
>>> z.metadata.codecs
173+
[BytesCodec(endian=<Endian.little: 'little'>), BloscCodec(typesize=4, cname=<BloscCname.zstd: 'zstd'>, clevel=3, shuffle=<BloscShuffle.bitshuffle: 'bitshuffle'>, blocksize=0)]
173174

174175
This array above will use Blosc as the primary compressor, using the Zstandard
175176
algorithm (compression level 3) internally within Blosc, and with the
@@ -181,87 +182,78 @@ which can be used to print useful diagnostics, e.g.::
181182

182183
>>> z.info
183184
Type : Array
184-
Zarr format : 2
185-
Data type : int32
185+
Zarr format : 3
186+
Data type : DataType.int32
186187
Shape : (10000, 10000)
187188
Chunk shape : (1000, 1000)
188189
Order : C
189190
Read-only : False
190191
Store type : LocalStore
191-
Compressor : Zstd(level=0)
192+
Codecs : [{'endian': <Endian.little: 'little'>}, {'typesize': 4, 'cname': <BloscCname.zstd: 'zstd'>, 'clevel': 3, 'shuffle': <BloscShuffle.bitshuffle: 'bitshuffle'>, 'blocksize': 0}]
192193
No. bytes : 400000000 (381.5M)
193194

194195
The :func:`zarr.Array.info_complete` method inspects the underlying store and
195196
prints additional diagnostics, e.g.::
196197

197198
>>> z.info_complete()
198199
Type : Array
199-
Zarr format : 2
200-
Data type : int32
200+
Zarr format : 3
201+
Data type : DataType.int32
201202
Shape : (10000, 10000)
202203
Chunk shape : (1000, 1000)
203204
Order : C
204205
Read-only : False
205206
Store type : LocalStore
206-
Compressor : Zstd(level=0)
207+
Codecs : [{'endian': <Endian.little: 'little'>}, {'typesize': 4, 'cname': <BloscCname.zstd: 'zstd'>, 'clevel': 3, 'shuffle': <BloscShuffle.bitshuffle: 'bitshuffle'>, 'blocksize': 0}]
207208
No. bytes : 400000000 (381.5M)
208-
No. bytes stored : 299348444
209-
Storage ratio : 1.3
209+
No. bytes stored : 9696302
210+
Storage ratio : 41.3
210211
Chunks Initialized : 100
211212

212213
.. note::
213214
:func:`zarr.Array.info_complete` will inspect the underlying store and may
214215
be slow for large arrays. Use :attr:`zarr.Array.info` if detailed storage
215216
statistics are not needed.
216217

217-
If you don't specify a compressor, by default Zarr uses the Blosc
218-
compressor. Blosc is generally very fast and can be configured in a variety of
219-
ways to improve the compression ratio for different types of data. Blosc is in
220-
fact a "meta-compressor", which means that it can use a number of different
221-
compression algorithms internally to compress the data. Blosc also provides
222-
highly optimized implementations of byte- and bit-shuffle filters, which can
223-
improve compression ratios for some data. A list of the internal compression
224-
libraries available within Blosc can be obtained via::
218+
If you don't specify a compressor, by default Zarr uses the Zstandard
219+
compressor.
225220

226-
>>> from numcodecs import blosc
227-
>>>
228-
>>> blosc.list_compressors()
229-
['blosclz', 'lz4', 'lz4hc', 'zlib', 'zstd']
221+
In addition to Blosc and Zstandard, other compression libraries can also be used. For example,
222+
here is an array using Gzip compression, level 1::
230223

231-
In addition to Blosc, other compression libraries can also be used. For example,
232-
here is an array using Zstandard compression, level 1::
224+
>>> data = np.arange(100000000, dtype='int32').reshape(10000, 10000)
225+
>>> z = zarr.create_array(store='data/example-6.zarr', shape=data.shape, dtype=data.dtype, chunks=(1000, 1000), compressors=zarr.codecs.GzipCodec(level=1))
226+
>>> z[:] = data
227+
>>> z.metadata.codecs
228+
[BytesCodec(endian=<Endian.little: 'little'>), GzipCodec(level=1)]
233229

234-
>>> from numcodecs import Zstd
235-
>>> # TODO: remove zarr_format and replace with create_array after #2463
236-
>>> z = zarr.array(store="data/example-7.zarr", data=np.arange(100000000, dtype='i4').reshape(10000, 10000), chunks=(1000, 1000), compressor=Zstd(level=1), zarr_format=2)
237-
>>> None # TODO: z.compressor
238-
239-
Here is an example using LZMA with a custom filter pipeline including LZMA's
230+
Here is an example using LZMA from NumCodecs_ with a custom filter pipeline including LZMA's
240231
built-in delta filter::
241232

242233
>>> import lzma
243-
>>> from numcodecs import LZMA
234+
>>> from numcodecs.zarr3 import LZMA
244235
>>>
245236
>>> lzma_filters = [dict(id=lzma.FILTER_DELTA, dist=4), dict(id=lzma.FILTER_LZMA2, preset=1)]
246-
>>> compressor = LZMA(filters=lzma_filters)
247-
>>> # TODO: remove zarr_format and replace with create_array after #2463
248-
>>> z = zarr.array(np.arange(100000000, dtype='i4').reshape(10000, 10000), chunks=(1000, 1000), compressor=compressor, zarr_format=2)
249-
>>> None # TODO: z.compressor
237+
>>> compressors = LZMA(filters=lzma_filters)
238+
>>> data = np.arange(100000000, dtype='int32').reshape(10000, 10000)
239+
>>> z = zarr.create_array(store='data/example-7.zarr', shape=data.shape, dtype=data.dtype, chunks=(1000, 1000), compressors=compressors)
240+
>>> z.metadata.codecs
241+
[BytesCodec(endian=<Endian.little: 'little'>), _make_bytes_bytes_codec.<locals>._Codec(codec_name='numcodecs.lzma', codec_config={'id': 'lzma', 'filters': [{'id': 3, 'dist': 4}, {'id': 33, 'preset': 1}]})]
250242

251243
The default compressor can be changed by setting the value of the using Zarr's
252244
:ref:`user-guide-config`, e.g.::
253245

254-
>>> with zarr.config.set({'array.v2_default_compressor.numeric': 'blosc'}):
246+
>>> with zarr.config.set({'array.v2_default_compressor.numeric': {'id': 'blosc'}}):
255247
... z = zarr.zeros(100000000, chunks=1000000, zarr_format=2)
256248
>>> z.metadata.filters
257249
>>> z.metadata.compressor
258-
LZMA(format=1, check=-1, preset=None, filters=[{'id': 3, 'dist': 4}, {'id': 33, 'preset': 1}])
259-
>>>
250+
Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
260251

261-
To disable compression, set ``compressor=None`` when creating an array, e.g.::
252+
To disable compression, set ``compressors=None`` when creating an array, e.g.::
262253

263-
>>> # TODO: remove zarr_format
264-
>>> z = zarr.zeros(100000000, chunks=1000000, compressor=None, zarr_format=2)
254+
>>> z = zarr.create_array(store='data/example-8.zarr', shape=(100000000,), chunks=(1000000,), dtype='int32', compressors=None)
255+
>>> z.metadata.codecs
256+
[BytesCodec(endian=<Endian.little: 'little'>)]
265257

266258
.. _user-guide-filters:
267259

@@ -281,24 +273,22 @@ mechanism for configuring filters outside of the primary compressor.
281273

282274
Here is an example using a delta filter with the Blosc compressor::
283275

284-
>>> from numcodecs import Blosc, Delta
276+
>>> from numcodecs.zarr3 import Delta
285277
>>>
286-
>>> filters = [Delta(dtype='i4')]
287-
>>> compressor = Blosc(cname='zstd', clevel=1, shuffle=Blosc.SHUFFLE)
288-
>>> data = np.arange(100000000, dtype='i4').reshape(10000, 10000)
289-
>>> # TODO: remove zarr_format and replace with create_array after #2463
290-
>>> z = zarr.array(data, chunks=(1000, 1000), filters=filters, compressor=compressor, zarr_format=2)
278+
>>> filters = [Delta(dtype='int32')]
279+
>>> compressors = zarr.codecs.BloscCodec(cname='zstd', clevel=1, shuffle=zarr.codecs.BloscShuffle.shuffle)
280+
>>> data = np.arange(100000000, dtype='int32').reshape(10000, 10000)
281+
>>> z = zarr.create_array(store='data/example-9.zarr', shape=data.shape, dtype=data.dtype, chunks=(1000, 1000), filters=filters, compressors=compressors)
291282
>>> z.info
292283
Type : Array
293-
Zarr format : 2
294-
Data type : int32
284+
Zarr format : 3
285+
Data type : DataType.int32
295286
Shape : (10000, 10000)
296287
Chunk shape : (1000, 1000)
297288
Order : C
298289
Read-only : False
299-
Store type : MemoryStore
300-
Compressor : Blosc(cname='zstd', clevel=1, shuffle=SHUFFLE, blocksize=0)
301-
Filters : (Delta(dtype='<i4'),)
290+
Store type : LocalStore
291+
Codecs : [{'codec_name': 'numcodecs.delta', 'codec_config': {'id': 'delta', 'dtype': 'int32'}}, {'endian': <Endian.little: 'little'>}, {'typesize': 4, 'cname': <BloscCname.zstd: 'zstd'>, 'clevel': 1, 'shuffle': <BloscShuffle.shuffle: 'shuffle'>, 'blocksize': 0}]
302292
No. bytes : 400000000 (381.5M)
303293

304294
For more information about available filter codecs, see the `Numcodecs
@@ -325,8 +315,9 @@ Indexing with coordinate arrays
325315
Items from a Zarr array can be extracted by providing an integer array of
326316
coordinates. E.g.::
327317

328-
>>> # TODO: replace with create_array after #2463
329-
>>> z = zarr.array(np.arange(10) ** 2)
318+
>>> data = np.arange(10) ** 2
319+
>>> z = zarr.create_array(store='data/example-10.zarr', shape=data.shape, dtype=data.dtype)
320+
>>> z[:] = data
330321
>>> z[:]
331322
array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
332323
>>> z.get_coordinate_selection([2, 5])
@@ -341,8 +332,9 @@ Coordinate arrays can also be used to update data, e.g.::
341332
For multidimensional arrays, coordinates must be provided for each dimension,
342333
e.g.::
343334

344-
>>> # TODO: replace with create_array after #2463
345-
>>> z = zarr.array(np.arange(15).reshape(3, 5))
335+
>>> data = np.arange(15).reshape(3, 5)
336+
>>> z = zarr.create_array(store='data/example-11.zarr', shape=data.shape, dtype=data.dtype)
337+
>>> z[:] = data
346338
>>> z[:]
347339
array([[ 0, 1, 2, 3, 4],
348340
[ 5, 6, 7, 8, 9],
@@ -381,8 +373,9 @@ Indexing with a mask array
381373

382374
Items can also be extracted by providing a Boolean mask. E.g.::
383375

384-
>>> # TODO: replace with create_array after #2463
385-
>>> z = zarr.array(np.arange(10) ** 2)
376+
>>> data = np.arange(10) ** 2
377+
>>> z = zarr.create_array(store='data/example-12.zarr', shape=data.shape, dtype=data.dtype)
378+
>>> z[:] = data
386379
>>> z[:]
387380
array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
388381
>>> sel = np.zeros_like(z, dtype=bool)
@@ -396,8 +389,9 @@ Items can also be extracted by providing a Boolean mask. E.g.::
396389

397390
Here's a multidimensional example::
398391

399-
>>> # TODO: replace with create_array after #2463
400-
>>> z = zarr.array(np.arange(15).reshape(3, 5))
392+
>>> data = np.arange(15).reshape(3, 5)
393+
>>> z = zarr.create_array(store='data/example-13.zarr', shape=data.shape, dtype=data.dtype)
394+
>>> z[:] = data
401395
>>> z[:]
402396
array([[ 0, 1, 2, 3, 4],
403397
[ 5, 6, 7, 8, 9],
@@ -436,8 +430,9 @@ selections to be made along each dimension of an array independently. For
436430
example, this allows selecting a subset of rows and/or columns from a
437431
2-dimensional array. E.g.::
438432

439-
>>> # TODO: replace with create_array after #2463
440-
>>> z = zarr.array(np.arange(15).reshape(3, 5))
433+
>>> data = np.arange(15).reshape(3, 5)
434+
>>> z = zarr.create_array(store='data/example-14.zarr', shape=data.shape, dtype=data.dtype)
435+
>>> z[:] = data
441436
>>> z[:]
442437
array([[ 0, 1, 2, 3, 4],
443438
[ 5, 6, 7, 8, 9],
@@ -460,8 +455,9 @@ Data can also be modified, e.g.::
460455
For convenience, the orthogonal indexing functionality is also available via the
461456
``oindex`` property, e.g.::
462457

463-
>>> # TODO: replace with create_array after #2463
464-
>>> z = zarr.array(np.arange(15).reshape(3, 5))
458+
>>> data = np.arange(15).reshape(3, 5)
459+
>>> z = zarr.create_array(store='data/example-15.zarr', shape=data.shape, dtype=data.dtype)
460+
>>> z[:] = data
465461
>>> z.oindex[[0, 2], :] # select first and third rows
466462
array([[ 0, 1, 2, 3, 4],
467463
[10, 11, 12, 13, 14]])
@@ -484,8 +480,9 @@ be used for orthogonal indexing.
484480
If the index contains at most one iterable, and otherwise contains only slices and integers,
485481
orthogonal indexing is also available directly on the array::
486482

487-
>>> # TODO: replace with create_array after #2463
488-
>>> z = zarr.array(np.arange(15).reshape(3, 5))
483+
>>> data = np.arange(15).reshape(3, 5)
484+
>>> z = zarr.create_array(store='data/example-16.zarr', shape=data.shape, dtype=data.dtype)
485+
>>> z[:] = data
489486
>>> np.all(z.oindex[[0, 2], :] == z[[0, 2], :])
490487
np.True_
491488

@@ -496,8 +493,9 @@ Zarr also support block indexing, which allows selections of whole chunks based
496493
logical indices along each dimension of an array. For example, this allows selecting
497494
a subset of chunk aligned rows and/or columns from a 2-dimensional array. E.g.::
498495

499-
>>> # TODO: replace with create_array after #2463
500-
>>> z = zarr.array(np.arange(100).reshape(10, 10), chunks=(3, 3))
496+
>>> data = np.arange(100).reshape(10, 10)
497+
>>> z = zarr.create_array(store='data/example-17.zarr', shape=data.shape, dtype=data.dtype, chunks=(3, 3))
498+
>>> z[:] = data
501499

502500
Retrieve items by specifying their block coordinates::
503501

@@ -531,7 +529,7 @@ For example::
531529

532530
Data can also be modified. Let's start by a simple 2D array::
533531

534-
>>> z = zarr.zeros((6, 6), dtype=int, chunks=2)
532+
>>> z = zarr.create_array(store='data/example-18.zarr', shape=(6, 6), dtype=int, chunks=(2, 2))
535533

536534
Set data for a selection of items::
537535

@@ -562,10 +560,9 @@ Any combination of integer and slice can be used for block indexing::
562560
array([[0, 0, 7, 7],
563561
[0, 0, 7, 7]])
564562
>>>
565-
>>> # TODO: replace with create_group after #2463
566-
>>> root = zarr.group('data/example-12.zarr')
567-
>>> foo = root.create_array(name='foo', shape=(1000, 100), chunks=(10, 10), dtype='f4')
568-
>>> bar = root.create_array(name='foo/bar', shape=(100,), dtype='i4')
563+
>>> root = zarr.create_group('data/example-19.zarr')
564+
>>> foo = root.create_array(name='foo', shape=(1000, 100), chunks=(10, 10), dtype='float32')
565+
>>> bar = root.create_array(name='foo/bar', shape=(100,), dtype='int32')
569566
>>> foo[:, :] = np.random.random((1000, 100))
570567
>>> bar[:] = np.arange(100)
571568
>>> root.tree()

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ test = [
7171
"pytest-accept",
7272
"moto[s3]",
7373
"requests",
74+
"rich",
7475
"mypy",
7576
"hypothesis",
7677
"universal-pathlib",

0 commit comments

Comments
 (0)