Skip to content

Commit a780691

Browse files
funkeyjakirkham
authored andcommitted
Add N5 Support (#309)
* Add N5Store for paths ending in '.n5' * Add N5 chunk headers via N5ChunkWrapper as a codec * Add compressor support to N5ChunkWrapper * Invert axis order for dimensions and blockSize N5 attributes * Convert raw chunks to big endian in N5ChunkWrapper * Consolidate compressor conversion according to N5 spec 1.0.0 https://github.com/saalfeldlab/n5/blob/45cb33ef6bd97fe6e872765a0bcfcc9e2a04591e/README.md * Always change byte-order to big endian in N5ChunkWrapper * Cleanup unused code fragments in n5.py * Transparently inject N5ChunkWrapper in array meta data * Map array attributes fill_value, order, and filters * Use numpy's tobytes for raw encoding in N5ChunkWrapper * Add gzip compressor support to N5Storage This requires numcodecs GZip support to be compatible with the N5 standard. Added with 410af66be0ea470d77b923e516d63cf5238114db to numcodecs. * Support N5's partial chunks at dataset boundaries * Add unit test stub for N5Store * Comply with zarr meta key names in N5Store * Overwrite __delitem__ and listdir in N5Store * Allow fill_value==None in N5Store * Allow compressor_config to be None in N5Store * Support older np.byteswap API * Expose N5Store in zarr top-level package * Refactor attribute conversion between zarr and N5 * Delegate __getitem__ to parent class for unrecognized keys * Fix runtime errors in N5Store * Invert coords only for chunk keys in N5Store * Add no cover pragmas to N5Store * Use _load_n5_attrs where possible in N5Store * Remove conditional that is always true in N5Store * Add tests for N5Store * Add N5Store entry to release notes * Ensure str type for json.loads in N5Store * Fix reading of partial N5 chunks * Bump the Numcodecs requirement to 0.6.1 * Enable testing of GZip as well * Change `_ensure_str` to `ensure_str` The included function doesn't use an `_` prefix. So this corrects that issue. * Switch from `buffer_copy` to `ndarray_copy` The former has been replaced by the latter as of Numcodecs 0.6.0. That said, this is technically internal spec. So maybe we shouldn't rely on it here if we can avoid it. For now this just fixes the import/usage, but we may want to replace this with a different strategy long term. * Raise more informative TypeError for unsupported dtypes in N5 * Skip test_structured_array_* for N5 * Fix flake8 errors for N5 store and related tests * Use `struct.pack` to convert integers to `bytes` As Python 2 lacks `int`'s `to_bytes` method, this adds a workaround that uses `struct.pack` to the same effect. This strategy works equally well on Python 3. * Use `struct.unpack` to convert `bytes` to integers As Python 2 lacks `int`'s `from_bytes` method, this adds a workaround that uses `struct.unpack` to the same effect. This strategy works equally well on Python 3. * Optionally use LZMA if it is available LZMA is Python 3 only. That said, we do use a backport package for Python 2 support. However it is a little tricky to install (particularly on Windows and/or with pip), so we cannot always guarantee that we have LZMA let alone that we can test it. While it would be nice to figure out how to ensure LZMA is available in all of our testing environments, for now simply try to test LZMA when it is available and ignore it otherwise. * Ensure `str` before using `json.loads` In Python 3.6+, `json.loads` will ensure data is handled with the proper encoding even if it is in a binary format like a `bytes` object. Python 2.7's `str` is `bytes` so this already works there. However Python 3 versions pre-3.6 need a properly decoded `str`. Thus we fix the test to use `ensure_str` to provide a `str` for use with `json.loads` to fix this issue. * Ensure JSON output is the same for Python 2/3 There are some stylistic differences between how Python 2 and Python 3 choose to construct the JSON representation of given data. These include various things like whether keys are sorted, whether it uses ASCII encoding, how it handles indentation. One detail was missed in this process, which is how separators are handled. As we already have a function to control how we write out JSON, go ahead and use it to handle JSON written by N5. * Fix remaining coveralls tests * Add N5 paragraph to tutorial * Fix typo in zarr/tests/test_storage.py Co-Authored-By: funkey <[email protected]> * Move N5 containers entry to 2.3.0 * Add author info to release entry * Add experimental notes to N5 support * Drop `mode` from `N5Store` * Test N5 without filters and with Make sure that an `AssertionError` is raised when N5 is provided a filter. * Test that using Fortran order with N5 errors * Test structured array cases raise TypeError on N5 The N5 spec does not support structured arrays. Instead of simply skipping structured array tests, check that they correctly raise a `TypeError` when trying to instantiate N5 arrays with structured dtypes. * Test object arrays raise with N5 The N5 spec does not support object arrays. Instead of just skipping the tests related to object arrays in N5, this tests that N5 raises appropriate errors when they occur. * Add API docs for N5 * Drop unneeded call to `close` The `N5Store` does not have a `close` method and has no need of one (as files auto-close). So drop the line from the N5 tutorial section that uses `close`. * Drop storage tests integrated into array tests These tests have been broken out and integrated into Zarr's Array tests in `master`. As such there is no need to keep them in Store tests or in this PR. Thus we remove them here. * Drop unused imports (reported by flake8) * Convert user-facing assertions to errors This converts many assertions that the `N5Store` raised to errors. In particular deals with assertions that are not actually just used for validation, but are propagating some message to the user. Update the associated error tests as well. * Use non-deprecated `.warning(...)` instead As the `.warn(...)` method of the logger is deprecated, switch to `.warning` instead. * Use warnings instead of logging As we are only using the logger to issue warnings, just issue warnings instead and drop the logger. Picked `RuntimeWarning`s for these two warnings as they are in regard to "dubious runtime behavior". Namely using compressors that are not explicitly supported by many N5 implementations to write N5 files. * Drop unused `logging` import * Check N5Store RuntimeWarnings for some compressors Make sure that `N5Store` is raising `RuntimeWarnings` for some compressors that are not widely supported. * Limit `RuntimeWarning` check to array creation * Test that N5Store raises for certain attributes As N5 stores attributes and metadata in the same JSON object, there are some keys that are simply not allowed to be attributes as they would conflict with the metadata (and possibly corrupt the data if they were written in). The `N5Store` correctly raises for these cases, but this was not being tested previously. Here we add a test to cover this case. * Test `fill_values` with N5Store-backed arrays Check that valid `fill_values` work with N5Store-backed arrays and that invalid `fill_values` raise. * Convert Blosc blocksize check back to an assert We are explicitly setting the `blocksize` to `0` internally. So this assertion is not a user-facing error. It would also be difficult to test from the API. As we set the `blocksize` to `0` in a different function, it does make sense to `assert` this here. * Handle loading and storing XZ data for N5 Numcodecs' LZMA compressor is able to handle XZ compressed data as a special case. This maps XZ compression in N5 to Zarr's LZMA compression with the proper arguments as well as maps Zarr's LZMA support back to N5's XZ support if the right options are set. * Test XZ support and update LZMA for N5Store arrays Adds some tests for XZ support in N5 using Numcodecs' LZMA compressor with specific options. Also tests LZMA options that are not currently supported by N5 to ensure they still warn and are handled correctly. * Remove LZ4 support for N5 There is not a clear mapping between LZ4 in the Zarr and N5 implementations at present. So this drops LZ4 from the supported codecs in `N5Store` for now.
1 parent 4ed5171 commit a780691

File tree

10 files changed

+1100
-12
lines changed

10 files changed

+1100
-12
lines changed

docs/api.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ API reference
88
api/core
99
api/hierarchy
1010
api/storage
11+
api/n5
1112
api/convenience
1213
api/codecs
1314
api/attrs

docs/api/n5.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
N5 (``zarr.n5``)
2+
================
3+
.. automodule:: zarr.n5
4+
5+
.. autoclass:: N5Store

docs/release.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,11 @@ Enhancements
3434
MongoDB database to be used as the backing store for an array or group.
3535
By :user:`Joe Hamman <jhamman>`, :issue:`299`, :issue:`372`.
3636

37+
* **New storage class for N5 containers**. The :class:`zarr.n5.N5Store` has been
38+
added, which uses :class:`zarr.storage.NestedDirectoryStore` to support
39+
reading and writing from and to N5 containers.
40+
By :user:`Jan Funke <funkey>` and :user:`John Kirkham <jakirkham>`
41+
3742
Bug fixes
3843
~~~~~~~~~
3944

docs/tutorial.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -746,6 +746,20 @@ with `MongoDB <https://www.mongodb.com/>`_ (an oject oriented NoSQL database). T
746746
respectively require the `redis <https://redis-py.readthedocs.io>`_ and
747747
`pymongo <https://api.mongodb.com/python/current/>`_ packages to be installed.
748748

749+
For compatibility with the `N5<https://github.com/saalfeldlab/n5`_ data format, Zarr also provides
750+
an N5 backend (this is currently an experimental feature). Similar to the zip storage class, an
751+
:class:`zarr.n5.N5Store` can be instantiated directly::
752+
753+
>>> store = zarr.N5Store('data/example.n5')
754+
>>> root = zarr.group(store=store)
755+
>>> z = root.zeros('foo/bar', shape=(1000, 1000), chunks=(100, 100), dtype='i4')
756+
>>> z[:] = 42
757+
758+
For convenience, the N5 backend will automatically be chosen when the filename
759+
ends with `.n5`::
760+
761+
>>> root = zarr.open('data/example.n5', mode='w')
762+
749763
Distributed/cloud storage
750764
~~~~~~~~~~~~~~~~~~~~~~~~~
751765

zarr/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,5 +15,6 @@
1515
from zarr.convenience import (open, save, save_array, save_group, load, copy_store,
1616
copy, copy_all, tree, consolidate_metadata,
1717
open_consolidated)
18+
from zarr.n5 import N5Store
1819
from zarr.errors import CopyError, MetadataError, PermissionError
1920
from zarr.version import version as __version__

zarr/creation.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
from zarr.core import Array
1010
from zarr.storage import (DirectoryStore, init_array, contains_array, contains_group,
1111
default_compressor, normalize_storage_path, ZipStore)
12+
from zarr.n5 import N5Store
1213
from numcodecs.registry import codec_registry
1314
from zarr.errors import err_contains_array, err_contains_group, err_array_not_found
1415

@@ -132,6 +133,8 @@ def normalize_store_arg(store, clobber=False, default=dict):
132133
if store.endswith('.zip'):
133134
mode = 'w' if clobber else 'a'
134135
return ZipStore(store, mode=mode)
136+
elif store.endswith('.n5'):
137+
return N5Store(store)
135138
else:
136139
return DirectoryStore(store)
137140
else:

0 commit comments

Comments
 (0)