@@ -144,26 +144,28 @@ Questions that still need to be resolved
144144We solicit feedback on the following area during the RFC period of this first
145145draft.
146146
147- - https://github.com/zarr-developers/zarr-specs/issues/72 to potentially split large metadata documents.
148- - extensions and ``must_understand = True `` might be too restrictive. Work a draft implementation with extensions and
149- see how far we can go. List of extensions to implement:
150-
151- - Boolean
152- - Complex
153- - Datetime
154- - Named dimensions
155- - Awkward arrays
156-
147+ - Should core metadata and user attributes be stored together or separate documents? ([GH72](https://github.com/zarr-developers/zarr-specs/issues/72))
148+ large metadata documents.
149+ - extensions and ``must_understand = True `` might be too restrictive. Work a
150+ draft implementation with extensions and
151+ see how far we can go. Possible list of extensions to implement:
152+
153+ - Boolean
154+ - Complex
155+ - Datetime
156+ - Named dimensions
157+ - Awkward arrays
158+
157159 See https://github.com/zarr-developers/zarr-specs/issues/89 for discussion on
158- the topic.
160+ the topic.
159161
160162 - Node name case sensitivity: The node name is now case sensitive, this may
161163 make store implementation more complicated as backed might not be (like some
162164 specific filesystem / object store), and we may want to recommend a standard
163165 escaping mechanism in those case. https://github.com/zarr-developers/zarr-specs/issues/57
164166
165167 - Node name character set: Same as above but unlike the previous point where we
166- solicit feedback on wither store implementation should support full unicode.
168+ solicit feedback on wither store implementation should support full unicode.
167169 https://github.com/zarr-developers/zarr-specs/issues/56
168170
169171 - Should named dimensions be part of the core metadata spec ? https://github.com/zarr-developers/zarr-specs/issues/73
@@ -256,7 +258,7 @@ conceptual model underpinning the Zarr protocol.
256258 dimension has an integer length. This specification only considers
257259 the case where the lengths of all dimensions are finite. However,
258260 `protocol extensions `_ may be defined which allow a dimension to have
259- infinite or variable length.
261+ an infinite or variable length.
260262
261263.. _shape :
262264
@@ -277,8 +279,8 @@ conceptual model underpinning the Zarr protocol.
277279 identified by a tuple of integer coordinates, one for each
278280 dimension _ of the array _. If all dimensions _ of an array _ have
279281 finite length, then the number of elements in the array _ is given
280- by the product of the dimension _ lengths. An array _ element may be
281- empty, or it may have a value .
282+ by the product of the dimension _ lengths. An array _ may not have
283+ been fully initialized .
282284
283285.. _data type :
284286
@@ -301,7 +303,7 @@ conceptual model underpinning the Zarr protocol.
301303 hyperrectangle defined by a tuple of intervals, one for each
302304 dimension _ of the array _. The chunk shape is the tuple of interval
303305 lengths, and the chunk size (i.e., number of elements _ contained
304- within the chunk) is the product of its interval lengths.
306+ within the chunk) is the product of its interval lengths.
305307
306308 The chunk shape elements are non-zero when the corresponding dimensions of
307309 the arrays are of non-zero length.
@@ -402,11 +404,11 @@ node names:
402404Node names are case sensitive, e.g., the names "foo" and "FOO" are **not **
403405identical.
404406
405- .. note:
406- The Zarr core development team recognises that restricting the set
407+ .. note:
408+ The Zarr core development team recognises that restricting the set
407409 of allowed characters creates an impediment and bias against users
408410 of different languages. We are actively discussing whether the full
409- Unicode character set could be allowed and what technical issues
411+ Unicode character set could be allowed and what technical issues
410412 this would entail. If you have experience or views please comment on
411413 `issue #56 <https://github.com/zarr-developers/zarr-specs/issues/56>`_.
412414
@@ -418,7 +420,7 @@ A data type describes the set of possible binary values that an array
418420element may take, along with some information about how the values
419421should be interpreted.
420422
421- This protocol defines a limited set of data types to represent Boolean
423+ This protocol defines a limited set of data types to represent boolean
422424values, integers, and floating point numbers. Protocol
423425extensions may define additional data types. All of the data types
424426defined here have a fixed size, in the sense that all values require
@@ -494,6 +496,18 @@ Core data types
494496 - unsigned integer
495497 - 8
496498 - little-endian
499+ * - ``>u2 ``
500+ - unsigned integer
501+ - 2
502+ - big-endian
503+ * - ``>u4 ``
504+ - unsigned integer
505+ - 4
506+ - big-endian
507+ * - ``>u8 ``
508+ - unsigned integer
509+ - 8
510+ - big-endian
497511 * - ``<f2 ``
498512 - half precision float: sign bit, 5 bits exponent, 10 bits mantissa
499513 - 2
@@ -528,24 +542,24 @@ Floating point types correspond to basic binary interchange formats as
528542defined by IEEE 754-2008.
529543
530544Additionally to these base types, an implementation should also handle the
531- raw/opaque pass through type designated by the lowercase letter ``r `` followed
532- by the number of bits, multiple of 8. For example, ``r8 ``, ``r16 ``, ``r24 ``
533- should be understood as fallback types of respectively 1, 2, and 3 bytes long .
545+ raw/opaque pass- through type designated by the lower-case letter ``r `` followed
546+ by the number of bits, multiple of 8. For example, ``r8 ``, ``r16 ``, and ``r24 ``
547+ should be understood as fall-back types of respectively 1, 2, and 3 byte length .
534548
535- Zarr v3.0 is limited to types length that are multiple of 8 bits but may open
536- other values in later version of this specification.
549+ Zarr v3 is limited to type sizes that are a multiple of 8 bits but may support
550+ other type sizes in later versions of this specification.
537551
538552
539553.. note ::
540554
541555 We are explicitely looking for more feedback and prototypes of code using the ``r* ``,
542- raw bits, for various endianess and wether the spec coudl be made clearer.
556+ raw bits, for various endianness and whether the spec could be made clearer.
543557
544558.. note ::
545559
546- currently only fixed size elements are supported as a core data type.
560+ Currently only fixed size elements are supported as a core data type.
547561 There are many request for variable length element encoding. There are many
548- way to encode variable length and we want to keep flexibility. While we seem
562+ ways to encode variable length and we want to keep flexibility. While we seem
549563 to agree that for random access the most likely contender is to have two
550564 arrays, one with the actual variable length data and one with fixed size
551565 (pointer + length) to the variable size data we do not want to commit to such
@@ -626,10 +640,10 @@ is contained within the chunk at grid index (1, 7, 2) and has coordinates
626640
627641The identifier for chunk with grid index (``i ``, ``j ``, ``k ``, ...) is
628642formed by joining together ASCII string representations of each index
629- using a separator and prefixed with the string ` 'c' ` . The default value for the separator is the slash
630- character (by default ``/ ``) , but this may be configured by providing a `` separator ``
631- value within the ``chunk_grid `` metadata object, see the section on
632- `Array metadata `_ below.
643+ using a separator and prefixed with the character `` c `` . The default value for
644+ the separator is the slash character lt ``/ ``, but this may be configured by
645+ providing a `` separator `` value within the ``chunk_grid `` metadata object ( see
646+ the section on `Array metadata `_ below) .
633647
634648For example, in a 3 dimensional array, the identifier for the chunk at
635649grid index (1, 23, 45) is the string "c1/23/45".
@@ -643,14 +657,14 @@ origin vertex of the array may occur at an arbitrary position within
643657any chunk, which is required to allow arrays to be extended by an
644658arbitrary length in a "negative" direction along any dimension.
645659
646- .. note :: A main difference with spec v2 is the default chunk separator
647- changed from ``. `` to ``/ `` this help with compatibility with N5 as well as
648- decrease the maximum number of items in hierarchical stores like directory
660+ .. note :: A main difference with spec v2 is that the default chunk separator
661+ changed from ``. `` to ``/ ``. This helps with compatibility with N5 as well as
662+ decreases the maximum number of items in hierarchical stores like directory
649663 stores.
650664
651665.. note :: Arrays may have 0 dimension (when for example representing scalars),
652666 in which case the coordinate of a chunk is the empty tuple, and the chunk key
653- will consist of the string `'c' `
667+ will consist of the string `` c ``.
654668
655669Chunk memory layouts
656670====================
@@ -769,12 +783,12 @@ name/value pairs. This section also defines how metadata documents are
769783encoded for storage.
770784
771785
772- Only the top level metadata document ``zarr.json `` is guarantied to be json, and
773- can be used to defined other format to array-level and group-level metadata
774- document; in the case where non-json metadata document are use in a zarr
775- hierarchy the following sections on group and array level metadata are
776- non-normative; but other metadata format as expected to define some equivalence
777- relations with the JSON documents.
786+ Only the top level metadata document ``zarr.json `` is guaranteed to be of JSON
787+ type, and can be used to define other formats for array-level and group-level
788+ metadata documents. In the case where non-JSON metadata documents are used in a
789+ Zarr hierarchy, the following sections on group and array level metadata are
790+ non-normative, but other metadata formats are expected to define some
791+ equivalence relations with the JSON documents.
778792
779793
780794Entry point metadata
@@ -827,8 +841,8 @@ containing the following names:
827841
828842 .. note ::
829843
830- This suffix is used is used to allow non hierarchy
831- browsing and edditign by non-zarr-aware tools.
844+ This suffix is used to allow non hierarchy
845+ browsing and editing by non-zarr-aware tools.
832846
833847``extensions ``
834848
@@ -914,7 +928,8 @@ following mandatory names:
914928 does not recognise the extension, but a ``fallback `` is present,
915929 then the implementation may proceed using the ``fallback `` value
916930 as the data type. For fallback types that do not correspond to base
917- known types, extensions can fallback on on a raw number of bytes using
931+ known types, extensions can fallback on a raw number of bytes using
932+ the raw type (``r* ``).
918933
919934``chunk_grid ``
920935
@@ -977,12 +992,13 @@ following mandatory names:
977992 the binary (starting with ``0b ``) or hexadecimal value (starting with
978993 ``0x ``) is accepted. This string must include all leading or trailing
979994 zeroes necessary to match the given type size. The string values ``"NaN" ``,
980- ``"+Infinity" `` and ``"-Infinity" `` are also understood for floating point datatypes.
995+ ``"+Infinity" `` and ``"-Infinity" `` are also understood for floating point
996+ data types.
981997
982998``extensions ``
983999
9841000 See the top level metadata extension section for the time being.
985-
1001+
9861002
9871003``attributes ``
9881004
@@ -995,11 +1011,12 @@ The following names are optional:
9951011
9961012 Specifies a codec to be used for encoding and decoding chunks. The
9971013 value must be an object containing the name ``codec `` whose value
998- is a URI that identifies a codec and dereferences to a human
999- readable representation of the codec specification. The codec
1000- object may also contain a ``configuration `` name whose value is
1001- defined by the corresponding codec specification. When the key for this is
1002- absent, this signor fies that no compressor has been used.
1014+ is a URI that identifies a codec and dereferences to a human-readable
1015+ representation of the codec specification. The codec
1016+ object may also contain a ``configuration `` object which consists of the
1017+ parameter names and values as defined by the corresponding codec
1018+ specification. When the ``compressor `` name is absent, this means that no
1019+ compressor is used.
10031020
10041021
10051022All other names within the array metadata object are reserved for
@@ -1193,9 +1210,9 @@ operations:
11931210 For example, if a store contains the keys "a/b", "a/c/d" and
11941211 "e/f/g", then ``list_prefix("a/") `` would return "a/b" and "a/c/d".
11951212
1196- Note behavior of ``list_prefix `` is undefined if ``prefix `` does not ends
1197- with a trailing slash ``/ `` and store can assume there is as least one key
1198- that stars with prefix.
1213+ Note: the behavior of ``list_prefix `` is undefined if ``prefix `` does not end
1214+ with a trailing slash ``/ `` and the store can assume there is at least one key
1215+ that starts with `` prefix `` .
11991216
12001217``list_dir `` - Retrieve all keys and prefixes with a given prefix and
12011218which do not contain the character "/" after the given prefix.
@@ -1296,7 +1313,7 @@ For an array at a non-root hierarchy path `P`, the metadata key for
12961313the array metadata document is formed by concatenating "meta/root",
12971314`P `, ".array", and the metadata key suffix.
12981315
1299- The data key for array chunks is formed by concatenating "data", `P `,
1316+ The data key for array chunks is formed by concatenating "data/root ", `P `,
13001317"/", and the chunk identifier as defined by the chunk grid layout.
13011318
13021319To get the path ``P `` from a metadata key, remove the trailing
@@ -1398,16 +1415,16 @@ Let "+" be the string concatenation operator.
13981415**Store element values in an array **
13991416
14001417 To store element in an array at path `P ` and coordinate (`i `, `j `,
1401- ...) perform ``set(data_key(P, i, j, ...), value) ``, where
1418+ ...), perform ``set(data_key(P, i, j, ...), value) ``, where
14021419 `value ` is the serialisation of the corresponding chunk, encoded
14031420 according to the information in the array metadata stored under
14041421 the key ``array_meta_key(P) ``.
14051422
14061423**Retrieve element values in an array **
14071424
14081425 To retrieve element in an array at path `P ` and coordinate (`i `,
1409- `j `, ...) perform ``get(data_key(P, i, j, ...), value) ``, where
1410- ` value ` is the serialisation of the corresponding chunk, encoded
1426+ `j `, ...), perform ``get(data_key(P, i, j, ...), value) ``. The returned
1427+ value is the serialisation of the corresponding chunk, encoded
14111428 according to the array metadata stored at ``array_meta_key(P) ``.
14121429
14131430**Discover children of a group **
@@ -1437,27 +1454,27 @@ Let "+" be the string concatenation operator.
14371454**Discover all nodes in a hierarchy **
14381455
14391456 To discover all nodes in a hierarchy, one can call
1440- ``list ("meta/") ``. All keys represent either explicit group or
1457+ ``list_prefix ("meta/root /") ``. All keys represent either explicit group or
14411458 arrays. All intermediate prefixes ending in a ``/ `` are implicit
14421459 groups.
14431460
14441461**Delete a group or array **
14451462
14461463 To delete an array at path `P `:
14471464 - delete the metadata document for the array, ``delete(array_meta_key(P)) ``
1448- - delete all data keys which prefix have path pointing to this to this array,
1449- ``delete_prefix("data/root" + P + "/") ``
1465+ - delete all data keys which prefix have path pointing to this array,
1466+ ``delete_prefix("data/root" + P + "/") ``
14501467
14511468 To delete an implicit group at path `P `:
14521469 - delete all nodes under this group - it should be sufficient to
1453- perform ``delete_prefix("meta/root" + P + "/") `` and
1454- ``delete_prefix("data/root" + P + "/") ``.
1470+ perform ``delete_prefix("meta/root" + P + "/") `` and
1471+ ``delete_prefix("data/root" + P + "/") ``.
14551472
14561473 To delete an explicit group at path `P `:
14571474 - delete the metadata document for the group, ``delete(group_meta_key(P)) ``
14581475 - delete all nodes under this group - it should be sufficient to
1459- perform ``delete_prefix("meta/root" + P + "/") `` and
1460- ``delete_prefix("data/root" + P + "/") ``.
1476+ perform ``delete_prefix("meta/root" + P + "/") `` and
1477+ ``delete_prefix("data/root" + P + "/") ``.
14611478
14621479 Note that store implementation may decide to reify implicit groups
14631480 and thus protocol implementation should attempt to delete the
@@ -1484,16 +1501,15 @@ Let "+" be the string concatenation operator.
14841501Protocol extensions
14851502===================
14861503
1487- Many types of extensions can exists for a Zarr Protocol , they can be regrouped
1488- in mostly 2 categories:
1504+ Many types of extensions can exist for a Zarr protocol , they can be regrouped
1505+ in mostly two categories:
14891506
1490- - Core Datatypes Extensions – for example adding ability store fixed size
1491- types like complex and datetime in chunks. These are directly declared in the
1492- array metadata ``data_type `` keys.
1493- - Arrays Extensions – Non rectilinear grids, and
1494- variable length types.
1507+ - Core data type extensions – for example adding the ability to store fixed size
1508+ types such as complex or datetime in chunks. These are directly declared in the
1509+ array metadata ``data_type `` key.
1510+ - Arrays extensions – non rectilinear grids, and variable length types.
14951511
1496- There are no group extensions as as Zarr v3.0
1512+ There are no group extensions in Zarr v3.0.
14971513
14981514See https://github.com/zarr-developers/zarr-specs/issues/49 for a list of potential extensions
14991515
0 commit comments