Skip to content

Commit 778283f

Browse files
committed
fixes for pypi
1 parent 66a3bf2 commit 778283f

File tree

3 files changed

+103
-115
lines changed

3 files changed

+103
-115
lines changed

README.md

Lines changed: 0 additions & 111 deletions
This file was deleted.

README.rst

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
zarr
2+
====
3+
4+
A minimal implementation of chunked, compressed, N-dimensional arrays for
5+
Python.
6+
7+
Installation
8+
------------
9+
10+
Install from GitHub (requires NumPy and Cython pre-installed)::
11+
12+
$ pip install -U git+https://github.com/alimanfoo/zarr.git@master
13+
14+
Status
15+
------
16+
17+
Highly experimental, pre-alpha. Bug reports and pull requests very welcome.
18+
19+
Design goals
20+
------------
21+
22+
* Chunking in multiple dimensions
23+
* Resize any dimension
24+
* Concurrent reads
25+
* Concurrent writes
26+
* Release the GIL during compression and decompression
27+
28+
Usage
29+
-----
30+
31+
Create an array::
32+
33+
>>> import numpy as np
34+
>>> import zarr
35+
>>> z = zarr.empty((10000, 1000), dtype='i4', chunks=(1000, 100))
36+
>>> z
37+
zarr.ext.Array((10000, 1000), int32, chunks=(1000, 100), nbytes=38.1M, cbytes=0, cname=blosclz, clevel=5, shuffle=1)
38+
39+
Fill it with some data::
40+
41+
>>> z[:] = np.arange(10000000, dtype='i4').reshape(10000, 1000)
42+
>>> z
43+
zarr.ext.Array((10000, 1000), int32, chunks=(1000, 100), nbytes=38.1M, cbytes=2.0M, cratio=19.3, cname=blosclz, clevel=5, shuffle=1)
44+
45+
Obtain a NumPy array::
46+
47+
>>> z[:]
48+
array([[ 0, 1, 2, ..., 997, 998, 999],
49+
[ 1000, 1001, 1002, ..., 1997, 1998, 1999],
50+
[ 2000, 2001, 2002, ..., 2997, 2998, 2999],
51+
...,
52+
[9997000, 9997001, 9997002, ..., 9997997, 9997998, 9997999],
53+
[9998000, 9998001, 9998002, ..., 9998997, 9998998, 9998999],
54+
[9999000, 9999001, 9999002, ..., 9999997, 9999998, 9999999]], dtype=int32)
55+
56+
Resize the array and add more data::
57+
58+
>>> z.resize(20000, 1000)
59+
>>> z
60+
zarr.ext.Array((20000, 1000), int32, chunks=(1000, 100), nbytes=76.3M, cbytes=2.0M, cratio=38.5, cname=blosclz, clevel=5, shuffle=1)
61+
>>> z[10000:, :] = np.arange(10000000, dtype='i4').reshape(10000, 1000)
62+
>>> z
63+
zarr.ext.Array((20000, 1000), int32, chunks=(1000, 100), nbytes=76.3M, cbytes=4.0M, cratio=19.3, cname=blosclz, clevel=5, shuffle=1)
64+
65+
For convenience, an `append` method is also available, which can be used to
66+
append data to any axis:
67+
68+
>>> a = np.arange(10000000, dtype='i4').reshape(10000, 1000)
69+
>>> z = zarr.array(a, chunks=(1000, 100))
70+
>>> z
71+
zarr.ext.Array((10000, 1000), int32, chunks=(1000, 100), nbytes=38.1M, cbytes=2.0M, cratio=19.3, cname=blosclz, clevel=5, shuffle=1)
72+
>>> z.append(a+a)
73+
>>> z
74+
zarr.ext.Array((20000, 1000), int32, chunks=(1000, 100), nbytes=76.3M, cbytes=3.6M, cratio=21.2, cname=blosclz, clevel=5, shuffle=1)
75+
>>> z.append(np.vstack([a, a]), axis=1)
76+
>>> z
77+
zarr.ext.Array((20000, 2000), int32, chunks=(1000, 100), nbytes=152.6M, cbytes=7.6M, cratio=20.2, cname=blosclz, clevel=5, shuffle=1)
78+
79+
Tuning
80+
------
81+
82+
``zarr`` is designed for use in parallel computations working chunk-wise
83+
over data. Try it with [dask.array](http://dask.pydata.org/en/latest/array.html).
84+
85+
``zarr`` is optimised for accessing and storing data in contiguous slices,
86+
of the same size or larger than chunks. It is not and will never be
87+
optimised for single item access.
88+
89+
Chunks sizes >= 1M are generally good. Optimal chunk shape will depend on
90+
the correlation structure in your data.
91+
92+
Acknowledgments
93+
---------------
94+
95+
``zarr`` uses [c-blosc](https://github.com/Blosc/c-blosc) internally for
96+
compression and decompression and borrows code heavily from
97+
[bcolz](http://bcolz.blosc.org/).

setup.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,17 +26,17 @@
2626

2727

2828
extra_compile_args = []
29-
if re.match("i.86|x86|AMD", platform.machine()) is not None:
29+
if re.match('i.86|x86|AMD', platform.machine()) is not None:
3030
# always enable SSE2 for AMD/Intel machines
3131
extra_compile_args.append('-DSHUFFLE_SSE2_ENABLED')
3232

3333
is_32bit = ctypes.sizeof(ctypes.c_voidp) == 4
3434
if is_32bit:
3535
if os.name == 'posix':
36-
extra_compile_args.append("-msse2")
36+
extra_compile_args.append('-msse2')
3737
elif os.name == 'nt':
3838
# this is currently broken for windows
39-
extra_compile_args.append("/arch:sse2")
39+
extra_compile_args.append('/arch:sse2')
4040

4141

4242
import numpy as np
@@ -56,7 +56,7 @@
5656
description = 'A minimal implementation of chunked, compressed, ' \
5757
'N-dimensional arrays for Python.'
5858

59-
with open('README.md') as f:
59+
with open('README.rst') as f:
6060
long_description = f.read()
6161

6262
setup(
@@ -75,6 +75,8 @@
7575
'setuptools-scm>1.5.4'
7676
],
7777
ext_modules=ext_modules,
78+
package_dir={'': '.'},
79+
packages=['zarr', 'zarr.tests'],
7880
classifiers=[
7981
'Development Status :: 2 - Pre-Alpha',
8082
'Intended Audience :: Developers',

0 commit comments

Comments
 (0)