99
1010Specification URI:
1111 https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html
12-
12+
1313Editors:
1414 * Alistair Miles (`@alimanfoo <https://github.com/alimanfoo >`_), Wellcome Sanger Institute
1515 * Jonathan Striebel (`@jstriebel <https://github.com/jstriebel >`_), Scalable Minds
@@ -462,7 +462,7 @@ This follows the
462462 representation is the ``UTF-8 `` encoded Unicode string.
463463
464464.. note ::
465- The prefix ``__zarr `` is reserved for core zarr data, and extensions
465+ The prefix ``__zarr `` is reserved for core zarr data, and extensions
466466 can use other files and folders starting with ``__ ``.
467467
468468
@@ -529,10 +529,10 @@ Core data types
529529 - 8-byte little endian
530530 * - ``float16 `` (optionally supported)
531531 - IEEE 754 half-precision floating point: sign bit, 5 bits exponent, 10 bits mantissa
532- - 2-byte little endian IEEE 754 binary16
532+ - 2-byte little endian IEEE 754 binary16
533533 * - ``float32 ``
534534 - IEEE 754 single-precision floating point: sign bit, 8 bits exponent, 23 bits mantissa
535- - 4-byte little endian IEEE 754 binary32
535+ - 4-byte little endian IEEE 754 binary32
536536 * - ``float64 ``
537537 - IEEE 754 double-precision floating point: sign bit, 11 bits exponent, 52 bits mantissa
538538 - 8-byte little endian IEEE 754 binary64
@@ -647,17 +647,8 @@ that chunk, where "%" is the modulo operator. For example, if a
647647is contained within the chunk at grid index (1, 7, 2) and has coordinates
648648(2, 10, 100) within that chunk.
649649
650- The identifier for chunk with grid index (``k ``, ``j ``, ``i ``, ...) is
651- formed by taking the initial prefix ``c ``, and appending for each dimension:
652-
653- - the ``separator `` character specified within the ``chunk_grid `` metadata object (see
654- the section on `Array metadata `_ below), followed by,
655-
656- - the ASCII decimal string representation of the chunk index within that dimension.
657-
658- For example, in a 3 dimensional array, with a separator of ``/ `` the identifier
659- for the chunk at grid index (1, 23, 45) is the string "c/1/23/45". With a
660- separator of ``. ``, the identifier is the string "c.1.23.45".
650+ The store key corresponding to a given grid cell is determined based on the
651+ `chunk_key_encoding `_ member of the `Array metadata `_.
661652
662653Note that this specification does not consider the case where the
663654chunk grid and the array space are not aligned at the origin vertices
@@ -668,14 +659,6 @@ origin element of the array may occur at an arbitrary position within
668659any chunk, which is required to allow arrays to be extended by an
669660arbitrary length in a "negative" direction along any dimension.
670661
671- .. note :: A main difference with spec v2 is that the default chunk separator
672- changed from ``. `` to ``/ ``, as in N5. This decreases the maximum number of
673- items in hierarchical stores like directory stores.
674-
675- .. note :: Arrays may have 0 dimensions (when for example representing scalars),
676- in which case the coordinate of a chunk is the empty tuple, and the chunk key
677- will consist of the string ``c ``.
678-
679662.. note :: Chunks at the border of an array always have the full chunk size, even when
680663 the array only covers parts of it. For example, having an array with ``"shape": [30, 30] `` and
681664 ``"chunk_shape": [16, 16] ``, the chunk ``0,1 `` would also contain unused values for the indices
@@ -863,7 +846,7 @@ mandatory names:
863846 if provided, its value must be one or a list of the data type identifiers
864847 defined in this specification or an extension. Fallback extension datatypes
865848 are specified as an object with ``name `` and (optionally) ``configuration ``.
866-
849+
867850 If an implementation does not recognise the extension or specific data type,
868851 but a ``fallback `` is present, then the implementation may proceed using the
869852 first known ``fallback `` value as the data type. For fixed-sized data types,
@@ -883,10 +866,10 @@ mandatory names:
883866 as defined in this specification, then the value must be an object with the
884867 names ``name `` and ``configuration ``. The value of ``name `` must be the
885868 string ``"regular" ``, and the value of ``configuration `` an object with the
886- names ``chunk_shape `` and `` separator ``. ``chunk_shape `` must be an array of
869+ member ``chunk_shape ``. ``chunk_shape `` must be an array of
887870 integers providing the lengths of the chunk along each dimension of the
888- array. `` separator `` must be either `` "/" `` or `` "." ``. For example,
889- ``{"type": "regular", "configuration": {"chunk_shape": [2, 5], "separator":"/" }} ``
871+ array. For example,
872+ ``{"type": "regular", "configuration": {"chunk_shape": [2, 5]}} ``
890873 means a regular grid where the chunks have length 2 along the first
891874 dimension and length 5 along the second dimension.
892875
@@ -895,6 +878,71 @@ mandatory names:
895878 must be a string referring to a v3 chunk grid specification. The
896879 ``configuration `` is optional and defined by the extension.
897880
881+ ``chunk_key_encoding ``
882+ ^^^^^^^^^^^^^^^^^^^^^^
883+
884+ The mapping from chunk grid cell coordinates to keys in the underlying
885+ store.
886+
887+ The value must be an object with required string member ``name ``, specifying
888+ the encoding type, and optional member ``configuration `` specifying encoding
889+ type-dependent parameters; the ``configuration `` value must be an object if
890+ it is specified.
891+
892+ The following encodings are defined:
893+
894+ - ``default ``
895+
896+ The ``configuration `` object may contain one optional member,
897+ ``separator ``, which must be either ``"/" `` or ``"." ``. If not specified,
898+ ``separator `` defaults to ``"/" ``.
899+
900+ The key for a chunk with grid index (``k ``, ``j ``, ``i ``, ...) is
901+ formed by taking the initial prefix ``c ``, and appending for each dimension:
902+
903+ - the ``separator `` character, followed by,
904+
905+ - the ASCII decimal string representation of the chunk index within that dimension.
906+
907+ For example, in a 3 dimensional array, with a separator of ``/ `` the identifier
908+ for the chunk at grid index (1, 23, 45) is the string ``"c/1/23/45" ``. With a
909+ separator of ``. ``, the identifier is the string ``"c.1.23.45" ``.
910+
911+ .. note :: A main difference with spec v2 is that the default chunk separator
912+ changed from ``. `` to ``/ ``, as in N5. This decreases the maximum number of
913+ items in hierarchical stores like directory stores.
914+
915+ .. note :: Arrays may have 0 dimensions (when for example representing scalars),
916+ in which case the coordinate of a chunk is the empty tuple, and the chunk key
917+ will consist of the string ``c ``.
918+
919+ - ``v2 ``
920+
921+ The ``configuration `` object may contain one optional member,
922+ ``separator ``, which must be either ``"/" `` or ``"." ``. If not specified,
923+ ``separator `` defaults to ``"." ``.
924+
925+ The identifier for chunk with at least one dimension is formed by
926+ concatenating for each dimension:
927+
928+ - the ASCII decimal string representation of the chunk index within that
929+ dimension, followed by
930+
931+ - the ``separator `` character, except that it is omitted for the last
932+ dimension.
933+
934+ For example, in a 3 dimensional array, with a separator of ``. `` the identifier
935+ for the chunk at grid index (1, 23, 45) is the string ``"1.23.45" ``. With a
936+ separator of ``/ ``, the identifier is the string ``"1/23/45" ``.
937+
938+ For chunk grids with 0 dimensions, the single chunk has the key ``"0" ``.
939+
940+ .. note ::
941+
942+ This encoding is intended only to allow existing v2 arrays to be
943+ converted to v3 without having to rename chunks. It is not recommended
944+ to be used when writing new arrays.
945+
898946``fill_value ``
899947^^^^^^^^^^^^^^
900948
@@ -1006,8 +1054,13 @@ compressed using gzip compression prior to storage::
10061054 "chunk_grid": {
10071055 "name": "regular",
10081056 "configuration": {
1009- "chunk_shape": [1000, 100],
1010- "separator" : "/"
1057+ "chunk_shape": [1000, 100]
1058+ }
1059+ },
1060+ "chunk_key_encoding": {
1061+ "name": "default",
1062+ "configuration": {
1063+ "separator": "/"
10111064 }
10121065 },
10131066 "codecs": [{
@@ -1035,15 +1088,20 @@ above, but using a (currently made up) extension data type::
10351088 "data_type": {
10361089 "name": "datetime",
10371090 "configuration": {
1038- "unit": "ns"
1091+ "unit": "ns"
10391092 },
10401093 "fallback": "int64"
10411094 },
10421095 "chunk_grid": {
10431096 "name": "regular",
10441097 "configuration": {
1045- "chunk_shape": [1000, 100],
1046- "separator" : "/"
1098+ "chunk_shape": [1000, 100]
1099+ }
1100+ },
1101+ "chunk_key_encoding": {
1102+ "name": "default",
1103+ "configuration": {
1104+ "separator": "/"
10471105 }
10481106 },
10491107 "codecs": [{
@@ -1056,14 +1114,14 @@ above, but using a (currently made up) extension data type::
10561114 }
10571115
10581116.. note ::
1059-
1117+
10601118 Comparison with zarr spec v2:
1061-
1119+
10621120 - ``dtype `` has been renamed to ``data_type ``,
1063- - ``chunks `` has been renamed to ``chunk_grid ``,
1121+ - ``chunks `` has been replaced with ``chunk_grid ``,
1122+ - ``dimension_separator `` has been replaced with ``chunk_key_encoding ``,
10641123 - ``order `` has been replaced by the :ref: `transpose <transpose-codec-v1 >` codec,
1065- - the separate ``filters `` and ``compressor `` fields been combined into the single ``codecs `` field,
1066- - ``zarr_format `` is now a string URL rather than a number.
1124+ - the separate ``filters `` and ``compressor `` fields been combined into the single ``codecs `` field.
10671125
10681126
10691127Group metadata
@@ -1551,12 +1609,13 @@ Extension points
15511609Different types of extensions can exist and they can be grouped as follows:
15521610
15531611=========== ======================= ================================================
1554- level extension metadata
1612+ level extension metadata
15551613=========== ======================= ================================================
1556- array data type `data_type `_
1557- array chunk grid `chunk_grid `_
1558- array codecs `codecs `_
1559- array storage transformer `storage_transformers (array) `_
1614+ array data type `data_type `_
1615+ array chunk grid `chunk_grid `_
1616+ array chunk key encoding `chunk_key_encoding `_
1617+ array codecs `codecs `_
1618+ array storage transformer `storage_transformers (array) `_
15601619=========== ======================= ================================================
15611620
15621621If such extension points are used by groups or arrays, they are required, except
0 commit comments