Skip to content

Commit c0a5e38

Browse files
committed
merge
2 parents 2488bc2 + 9a6de6e commit c0a5e38

23 files changed

+2505
-1087
lines changed

docs/_static/donotdelete

Whitespace-only changes.

docs/api/storage.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,10 @@ can be used as a Zarr array store.
1111

1212
.. autoclass:: DictStore
1313
.. autoclass:: DirectoryStore
14+
.. autoclass:: TempStore
1415
.. autoclass:: ZipStore
1516

17+
.. automethod:: close
18+
.. automethod:: flush
19+
1620
.. autofunction:: migrate_1to2

docs/release.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,23 @@ Release notes
33

44
* Group objects now support member deletion via ``del`` statement
55
('#65 <https://github.com/alimanfoo/zarr/issues/65>'_)
6+
* Added :class:`zarr.storage.TempStore` class for convenience to provide
7+
storage via a temporary directory
8+
(`#59 <https://github.com/alimanfoo/zarr/issues/59>`_)
9+
* Fixed performance issues with ``ZipStore`` class
10+
(`#66 <https://github.com/alimanfoo/zarr/issues/27>`_)
11+
* The Blosc extension has been modified to return bytes instead of array
12+
objects from compress and decompress function calls. This should
13+
improve compatibility and also provides a small performance increase for
14+
compressing high compression ratio data
15+
(`#55 <https://github.com/alimanfoo/zarr/issues/55>`_).
16+
* Added ``overwrite`` keyword argument to array and group creation methods
17+
on the :class:`zarr.hierarchy.Group` class
18+
(`#71 <https://github.com/alimanfoo/zarr/issues/71>`_).
19+
* Added ``cache_metadata`` keyword argument to array creation methods.
20+
* The functions :func:`zarr.creation.open_array` and
21+
:func:`zarr.hierarchy.open_group` now accept any store as first argument
22+
(`#56 <https://github.com/alimanfoo/zarr/issues/56>`_).
623

724
.. _release_2.0.1:
825

docs/spec/v2.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -442,6 +442,7 @@ Here is the same example using a Zip file as storage::
442442
>>> sub_grp = root_grp.create_group('foo')
443443
>>> a = sub_grp.create_dataset('bar', shape=(20, 20), chunks=(10, 10))
444444
>>> a[:] = 42
445+
>>> store.close()
445446

446447
What has been stored::
447448

docs/tutorial.rst

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,7 @@ the delta filter::
230230
... chunks=(1000, 1000), compressor=compressor)
231231
>>> z
232232
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
233-
nbytes: 381.5M; nbytes_stored: 248.9K; ratio: 1569.6; initialized: 100/100
233+
nbytes: 381.5M; nbytes_stored: 248.9K; ratio: 1569.7; initialized: 100/100
234234
compressor: LZMA(format=1, check=-1, preset=None, filters=[{'dist': 4, 'id': 3}, {'preset': 1, 'id': 33}])
235235
store: dict
236236

@@ -327,7 +327,7 @@ provided that all processes have access to a shared file system. E.g.::
327327
... synchronizer=synchronizer)
328328
>>> z
329329
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
330-
nbytes: 381.5M; nbytes_stored: 326; ratio: 1226993.9; initialized: 0/100
330+
nbytes: 381.5M; nbytes_stored: 323; ratio: 1238390.1; initialized: 0/100
331331
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
332332
store: DirectoryStore; synchronizer: ProcessSynchronizer
333333

@@ -515,6 +515,7 @@ Here is an example storing an array directly into a Zip file::
515515
nbytes: 3.8M; nbytes_stored: 21.8K; ratio: 179.2; initialized: 100/100
516516
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
517517
store: ZipStore
518+
>>> store.close()
518519
>>> import os
519520
>>> os.path.getsize('example.zip')
520521
30721
@@ -536,12 +537,17 @@ Re-open and check that data have been written::
536537
[42, 42, 42, ..., 42, 42, 42],
537538
[42, 42, 42, ..., 42, 42, 42],
538539
[42, 42, 42, ..., 42, 42, 42]], dtype=int32)
540+
>>> store.close()
539541

540542
Note that there are some restrictions on how Zip files can be used,
541543
because items within a Zip file cannot be updated in place. This means
542544
that data in the array should only be written once and write
543545
operations should be aligned with chunk boundaries.
544546

547+
Note also that the ``close()`` method must be called after writing any data to
548+
the store, otherwise essential records will not be written to the underlying
549+
zip file.
550+
545551
The Dask project has implementations of the ``MutableMapping``
546552
interface for distributed storage systems, see the `S3Map
547553
<http://s3fs.readthedocs.io/en/latest/api.html#s3fs.mapping.S3Map>`_
Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": 1,
6+
"metadata": {
7+
"collapsed": false
8+
},
9+
"outputs": [
10+
{
11+
"data": {
12+
"text/plain": [
13+
"'2.0.1'"
14+
]
15+
},
16+
"execution_count": 1,
17+
"metadata": {},
18+
"output_type": "execute_result"
19+
}
20+
],
21+
"source": [
22+
"import numpy as np\n",
23+
"import zarr\n",
24+
"zarr.__version__"
25+
]
26+
},
27+
{
28+
"cell_type": "code",
29+
"execution_count": 2,
30+
"metadata": {
31+
"collapsed": false
32+
},
33+
"outputs": [
34+
{
35+
"name": "stdout",
36+
"output_type": "stream",
37+
"text": [
38+
"10 loops, best of 3: 110 ms per loop\n",
39+
"1 loop, best of 3: 235 ms per loop\n",
40+
"Array((100000000,), int64, chunks=(200000,), order=C)\n",
41+
" nbytes: 762.9M; nbytes_stored: 11.2M; ratio: 67.8; initialized: 500/500\n",
42+
" compressor: Blosc(cname='lz4', clevel=5, shuffle=1)\n",
43+
" store: dict\n"
44+
]
45+
}
46+
],
47+
"source": [
48+
"z = zarr.empty(shape=100000000, chunks=200000, dtype='i8')\n",
49+
"data = np.arange(100000000, dtype='i8')\n",
50+
"%timeit z[:] = data\n",
51+
"%timeit z[:]\n",
52+
"print(z)\n",
53+
"assert np.all(z[:] == data)"
54+
]
55+
},
56+
{
57+
"cell_type": "code",
58+
"execution_count": 3,
59+
"metadata": {
60+
"collapsed": false
61+
},
62+
"outputs": [
63+
{
64+
"name": "stdout",
65+
"output_type": "stream",
66+
"text": [
67+
"1 loop, best of 3: 331 ms per loop\n",
68+
"1 loop, best of 3: 246 ms per loop\n",
69+
"Array((100000000,), float64, chunks=(200000,), order=C)\n",
70+
" nbytes: 762.9M; nbytes_stored: 724.8M; ratio: 1.1; initialized: 500/500\n",
71+
" compressor: Blosc(cname='lz4', clevel=5, shuffle=1)\n",
72+
" store: dict\n"
73+
]
74+
}
75+
],
76+
"source": [
77+
"z = zarr.empty(shape=100000000, chunks=200000, dtype='f8')\n",
78+
"data = np.random.normal(size=100000000)\n",
79+
"%timeit z[:] = data\n",
80+
"%timeit z[:]\n",
81+
"print(z)\n",
82+
"assert np.all(z[:] == data)"
83+
]
84+
},
85+
{
86+
"cell_type": "code",
87+
"execution_count": 1,
88+
"metadata": {
89+
"collapsed": false
90+
},
91+
"outputs": [
92+
{
93+
"data": {
94+
"text/plain": [
95+
"'2.0.2.dev0+dirty'"
96+
]
97+
},
98+
"execution_count": 1,
99+
"metadata": {},
100+
"output_type": "execute_result"
101+
}
102+
],
103+
"source": [
104+
"import numpy as np\n",
105+
"import sys\n",
106+
"sys.path.insert(0, '..')\n",
107+
"import zarr\n",
108+
"zarr.__version__"
109+
]
110+
},
111+
{
112+
"cell_type": "code",
113+
"execution_count": 2,
114+
"metadata": {
115+
"collapsed": false
116+
},
117+
"outputs": [
118+
{
119+
"name": "stdout",
120+
"output_type": "stream",
121+
"text": [
122+
"10 loops, best of 3: 92.7 ms per loop\n",
123+
"1 loop, best of 3: 230 ms per loop\n",
124+
"Array((100000000,), int64, chunks=(200000,), order=C)\n",
125+
" nbytes: 762.9M; nbytes_stored: 11.2M; ratio: 67.8; initialized: 500/500\n",
126+
" compressor: Blosc(cname='lz4', clevel=5, shuffle=1)\n",
127+
" store: dict\n"
128+
]
129+
}
130+
],
131+
"source": [
132+
"z = zarr.empty(shape=100000000, chunks=200000, dtype='i8')\n",
133+
"data = np.arange(100000000, dtype='i8')\n",
134+
"%timeit z[:] = data\n",
135+
"%timeit z[:]\n",
136+
"print(z)\n",
137+
"assert np.all(z[:] == data)"
138+
]
139+
},
140+
{
141+
"cell_type": "code",
142+
"execution_count": 3,
143+
"metadata": {
144+
"collapsed": false
145+
},
146+
"outputs": [
147+
{
148+
"name": "stdout",
149+
"output_type": "stream",
150+
"text": [
151+
"1 loop, best of 3: 338 ms per loop\n",
152+
"1 loop, best of 3: 253 ms per loop\n",
153+
"Array((100000000,), float64, chunks=(200000,), order=C)\n",
154+
" nbytes: 762.9M; nbytes_stored: 724.8M; ratio: 1.1; initialized: 500/500\n",
155+
" compressor: Blosc(cname='lz4', clevel=5, shuffle=1)\n",
156+
" store: dict\n"
157+
]
158+
}
159+
],
160+
"source": [
161+
"z = zarr.empty(shape=100000000, chunks=200000, dtype='f8')\n",
162+
"data = np.random.normal(size=100000000)\n",
163+
"%timeit z[:] = data\n",
164+
"%timeit z[:]\n",
165+
"print(z)\n",
166+
"assert np.all(z[:] == data)"
167+
]
168+
},
169+
{
170+
"cell_type": "code",
171+
"execution_count": null,
172+
"metadata": {
173+
"collapsed": true
174+
},
175+
"outputs": [],
176+
"source": []
177+
}
178+
],
179+
"metadata": {
180+
"kernelspec": {
181+
"display_name": "Python 3",
182+
"language": "python",
183+
"name": "python3"
184+
},
185+
"language_info": {
186+
"codemirror_mode": {
187+
"name": "ipython",
188+
"version": 3
189+
},
190+
"file_extension": ".py",
191+
"mimetype": "text/x-python",
192+
"name": "python",
193+
"nbconvert_exporter": "python",
194+
"pygments_lexer": "ipython3",
195+
"version": "3.5.1"
196+
}
197+
},
198+
"nbformat": 4,
199+
"nbformat_minor": 1
200+
}

0 commit comments

Comments
 (0)