Skip to content

Commit 2c13b95

Browse files
authored
Implement Zarr V3 protocol (#898)
* add v3 store classes Define the StoreV3 class and create v3 versions of most existing stores Add a test_storage_v3.py with test classes inheriting from their v2 counterparts. Only a subset of methods involving differences in v3 behavior were overridden. * add TODO comment to meta.py * fix flake8 errors * follow zarr v3 spec when dealing with extension data types * fixes to v3 dtype handling * flake8 cleanup * remove duplicate lines in Metadata2.encode_array_metadata * Fix fields in array metadata zarr_version should not be in the array metadata, only the base store metadata compressor should be absent when there is no compression * Fix encode/decode of codec metadata classmethods adapted from zarrita code * add missing level to Zlib in _decode_codec_metadata * add extensions entry to v3 array metadata * dimension_separator should not be in the array metadata for v3 * update Attributes, adding StoreV3 support avoid pytest error about missing fixture fix flake8 error related to zarr_version fixture * add StoreV3 support to core Array object * update hexdigests * handle additional codecs that were not implemented in zarrita update hexdigests * fix * fix hexdigests * fix indentation * add StoreV3 support to Group, open_group, etc. * add StoreV3 support to creation routines * Handle dimension_separator appropriately in open_array Specifically, we want to be able to infer the dimension_separator from the store if possible * TST: add tests for open_array and dimension_separator * only allow Codec not a simple str as compressor during array initialization * add StoreV3 support to most convenience routines consolidated metadata functions haven't been updated yet * set convenience routines default to zarr_version=None This will infer the version from the store if it is a BaseStore. Otherwise it will use 2 for backwards compatibility * adjust test have dimension_separator key was removed from v3 metadata * add underscores to imported test classes in test_storage_v3.py avoids these tests running a second time when this file is called * add underscore to imported TestArrayWithPath in test_core_v3.py avoids this test class from being run a second time * refactore _valid_keys and add tests test _ensure_store(None) * move KVStoreV3 logic from StoreV3.__eq__ to KVStoreV3.__eq__ * expand tests for _ensure_store * test exception for v2 store input to _get_hierarchy_metadata * test exception for init_array with path=None * remove unneeded checks from Attributes The store can reject invalid v3 keys. _update_nosync calls _get_nosync which will add the 'attributes' key if missing * tests __repr__ of LazyLoader * test load of individual array * Add simple test case for zarr.tree convenience method * add tests for copy_store with a V3 store class * test raising of exception on intialization with mismatched store and chunk_store protocol versions * add key validation on setitem in v3 stores enable missing test_hierarchy for v3 stores. This required fixes to a number of the rename and rmdir methods for the V3 stores * Fix core V3 tests now that keys are validated on __setitem__ * pep8 in storage_v3 tests * flake8 in test_convenience.py * pep8 * fix test_attrs.py validate_key requires attr key to start with meta/ or data/ in v3 * Fix SQLLiteStore changes to rmdir were intended for SQLLiteStoreV3 not SQLLiteStore * fix failing hierarchy test * update ZipStore tests to make sure they all run on V3 * add default rmdir implementation to all StoreV3 classes without these can be overridden by the other V2 class in the MRO * fix test_sync.py * all rmdir methods for StoreV3 classes need to remove associated metadata * avoid warning from test_entropy.py * pep8 fixes * greatly reduce code duplication in test_storage_v3.py instead add v3 code path to existing test methods in test_storage.py * remove redundant test_hexdigest methods only need to defined expected() for each class reduce redundant code in test_core_v3.py * move test_core_v3.py functions back into test_core.py * typing fixes for mypy * can assume self.keys() exists since BaseStore inherits from MutableMapping * refactor rmdir methods for v3 and improve coverage * improve coverage of core.py * improve coverage of convenience.py * expend info tests needed to also test with a size > 10**12 to improve coverage * Expand tests of Array.view * improve coverage of creation.py * improve coverage of hierarchy.py * improve coverage of meta.py * pep8 * skip FSStoreV3 test when fsspec not installed * test raising of PermissionError for setter on views * remove redundant check (_normalize_store_arg will already raise here) * improve coverage and fix bugs in normalize_store_arg * improve coverage of storage.py remove redundant getsize methods * pep8 * fix StoreV3 tests * fix duplicate zarr_fsstore entry * fix rename * remove debug statements * fix typo * skip unavailable NumPy dtypes * pep8 * mypy fixes * remove redundant check (already done above) * remove KeyError check. list_prefix only returns keys that exist * coverage fixes * implemented ConsolidatedMetadataStoreV3 Parametrize test_consolidate_metadata: removes the need for a separate test_consolidated_with_chunk_store * expand ConsolidatedMetadataStoreV3 tests update _ensure_store to disallow mismatched Store versions * remove debug statement * fix tests: restore clobber=True * test error path in consolidate_metadata * add pragma: no cover for lines in test_meta.py that will only be visited on some architectures * flake8 fixes * flake8 * ENH: add ABSStoreV3 * flake8 * fix ABSStore.rmdir test coverage * always use / in path * remove remaining use of clobber argument in new tests * remove NestedDirectoryStoreV3 No need for this class as DirectoryStoreV3 with / chunk separator can be used instead * flake8 * remove rmdir_abs: rmdir method of ABSStore parent class in ABSStoreV3 * define meta_root and data_root variables These define the root path for metadata and data, respectively * move _valid_key_characters to be a StoreV3 class field * make _get_hierarchy_metadata strictly require 'zarr.json' Still use a default set of metadata in __init__ method of Group or Array classes. Add a _get_metadata_suffix helper that defaults to '.json' if metadata is not present. * ignore type checks for _get_metadata_suffix * remove unneeded if/else in Array and Hierarchy class __init__ default metadata already gets added by Metadata3.encode_hierarchy_metadata when meta=None * remove unused import * define DEFAULT_ZARR_VERSION so we can later more easily change from 2 to 3 * add test_get_hierarchy_metadata to test the v3 _get_hierarchy_metadata helper
1 parent 3f8a309 commit 2c13b95

22 files changed

+5383
-1359
lines changed

zarr/_storage/absstore.py

Lines changed: 55 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
import warnings
44
from numcodecs.compat import ensure_bytes
55
from zarr.util import normalize_storage_path
6-
from zarr._storage.store import Store
6+
from zarr._storage.store import _get_metadata_suffix, data_root, meta_root, Store, StoreV3
77

88
__doctest_requires__ = {
99
('ABSStore', 'ABSStore.*'): ['azure.storage.blob'],
@@ -209,3 +209,57 @@ def getsize(self, path=None):
209209

210210
def clear(self):
211211
self.rmdir()
212+
213+
214+
class ABSStoreV3(ABSStore, StoreV3):
215+
216+
def list(self):
217+
return list(self.keys())
218+
219+
def __eq__(self, other):
220+
return (
221+
isinstance(other, ABSStoreV3) and
222+
self.client == other.client and
223+
self.prefix == other.prefix
224+
)
225+
226+
def __setitem__(self, key, value):
227+
self._validate_key(key)
228+
super().__setitem__(key, value)
229+
230+
def rmdir(self, path=None):
231+
232+
if not path:
233+
# Currently allowing clear to delete everything as in v2
234+
235+
# If we disallow an empty path then we will need to modify
236+
# TestABSStoreV3 to have the create_store method use a prefix.
237+
ABSStore.rmdir(self, '')
238+
return
239+
240+
meta_dir = meta_root + path
241+
meta_dir = meta_dir.rstrip('/')
242+
ABSStore.rmdir(self, meta_dir)
243+
244+
# remove data folder
245+
data_dir = data_root + path
246+
data_dir = data_dir.rstrip('/')
247+
ABSStore.rmdir(self, data_dir)
248+
249+
# remove metadata files
250+
sfx = _get_metadata_suffix(self)
251+
array_meta_file = meta_dir + '.array' + sfx
252+
if array_meta_file in self:
253+
del self[array_meta_file]
254+
group_meta_file = meta_dir + '.group' + sfx
255+
if group_meta_file in self:
256+
del self[group_meta_file]
257+
258+
# TODO: adapt the v2 getsize method to work for v3
259+
# For now, calling the generic keys-based _getsize
260+
def getsize(self, path=None):
261+
from zarr.storage import _getsize # avoid circular import
262+
return _getsize(self, path)
263+
264+
265+
ABSStoreV3.__doc__ = ABSStore.__doc__

0 commit comments

Comments
 (0)