Skip to content

Commit a15892b

Browse files
authored
Merge pull request #104 from davidbrochart/fix_v3
Review
2 parents e8c0b45 + 4729d5d commit a15892b

File tree

1 file changed

+90
-74
lines changed

1 file changed

+90
-74
lines changed

docs/protocol/core/v3.0.rst

Lines changed: 90 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -144,26 +144,28 @@ Questions that still need to be resolved
144144
We solicit feedback on the following area during the RFC period of this first
145145
draft.
146146

147-
- https://github.com/zarr-developers/zarr-specs/issues/72 to potentially split large metadata documents.
148-
- extensions and ``must_understand = True`` might be too restrictive. Work a draft implementation with extensions and
149-
see how far we can go. List of extensions to implement:
150-
151-
- Boolean
152-
- Complex
153-
- Datetime
154-
- Named dimensions
155-
- Awkward arrays
156-
147+
- Should core metadata and user attributes be stored together or separate documents? ([GH72](https://github.com/zarr-developers/zarr-specs/issues/72))
148+
large metadata documents.
149+
- extensions and ``must_understand = True`` might be too restrictive. Work a
150+
draft implementation with extensions and
151+
see how far we can go. Possible list of extensions to implement:
152+
153+
- Boolean
154+
- Complex
155+
- Datetime
156+
- Named dimensions
157+
- Awkward arrays
158+
157159
See https://github.com/zarr-developers/zarr-specs/issues/89 for discussion on
158-
the topic.
160+
the topic.
159161

160162
- Node name case sensitivity: The node name is now case sensitive, this may
161163
make store implementation more complicated as backed might not be (like some
162164
specific filesystem / object store), and we may want to recommend a standard
163165
escaping mechanism in those case. https://github.com/zarr-developers/zarr-specs/issues/57
164166

165167
- Node name character set: Same as above but unlike the previous point where we
166-
solicit feedback on wither store implementation should support full unicode.
168+
solicit feedback on wither store implementation should support full unicode.
167169
https://github.com/zarr-developers/zarr-specs/issues/56
168170

169171
- Should named dimensions be part of the core metadata spec ? https://github.com/zarr-developers/zarr-specs/issues/73
@@ -256,7 +258,7 @@ conceptual model underpinning the Zarr protocol.
256258
dimension has an integer length. This specification only considers
257259
the case where the lengths of all dimensions are finite. However,
258260
`protocol extensions`_ may be defined which allow a dimension to have
259-
infinite or variable length.
261+
an infinite or variable length.
260262

261263
.. _shape:
262264

@@ -277,8 +279,8 @@ conceptual model underpinning the Zarr protocol.
277279
identified by a tuple of integer coordinates, one for each
278280
dimension_ of the array_. If all dimensions_ of an array_ have
279281
finite length, then the number of elements in the array_ is given
280-
by the product of the dimension_ lengths. An array_ element may be
281-
empty, or it may have a value.
282+
by the product of the dimension_ lengths. An array_ may not have
283+
been fully initialized.
282284

283285
.. _data type:
284286

@@ -301,7 +303,7 @@ conceptual model underpinning the Zarr protocol.
301303
hyperrectangle defined by a tuple of intervals, one for each
302304
dimension_ of the array_. The chunk shape is the tuple of interval
303305
lengths, and the chunk size (i.e., number of elements_ contained
304-
within the chunk) is the product of its interval lengths.
306+
within the chunk) is the product of its interval lengths.
305307

306308
The chunk shape elements are non-zero when the corresponding dimensions of
307309
the arrays are of non-zero length.
@@ -402,11 +404,11 @@ node names:
402404
Node names are case sensitive, e.g., the names "foo" and "FOO" are **not**
403405
identical.
404406

405-
.. note:
406-
The Zarr core development team recognises that restricting the set
407+
.. note:
408+
The Zarr core development team recognises that restricting the set
407409
of allowed characters creates an impediment and bias against users
408410
of different languages. We are actively discussing whether the full
409-
Unicode character set could be allowed and what technical issues
411+
Unicode character set could be allowed and what technical issues
410412
this would entail. If you have experience or views please comment on
411413
`issue #56 <https://github.com/zarr-developers/zarr-specs/issues/56>`_.
412414
@@ -418,7 +420,7 @@ A data type describes the set of possible binary values that an array
418420
element may take, along with some information about how the values
419421
should be interpreted.
420422

421-
This protocol defines a limited set of data types to represent Boolean
423+
This protocol defines a limited set of data types to represent boolean
422424
values, integers, and floating point numbers. Protocol
423425
extensions may define additional data types. All of the data types
424426
defined here have a fixed size, in the sense that all values require
@@ -494,6 +496,18 @@ Core data types
494496
- unsigned integer
495497
- 8
496498
- little-endian
499+
* - ``>u2``
500+
- unsigned integer
501+
- 2
502+
- big-endian
503+
* - ``>u4``
504+
- unsigned integer
505+
- 4
506+
- big-endian
507+
* - ``>u8``
508+
- unsigned integer
509+
- 8
510+
- big-endian
497511
* - ``<f2``
498512
- half precision float: sign bit, 5 bits exponent, 10 bits mantissa
499513
- 2
@@ -528,24 +542,24 @@ Floating point types correspond to basic binary interchange formats as
528542
defined by IEEE 754-2008.
529543

530544
Additionally to these base types, an implementation should also handle the
531-
raw/opaque pass through type designated by the lowercase letter ``r`` followed
532-
by the number of bits, multiple of 8. For example, ``r8``, ``r16``, ``r24``
533-
should be understood as fallback types of respectively 1, 2, and 3 bytes long.
545+
raw/opaque pass-through type designated by the lower-case letter ``r`` followed
546+
by the number of bits, multiple of 8. For example, ``r8``, ``r16``, and ``r24``
547+
should be understood as fall-back types of respectively 1, 2, and 3 byte length.
534548

535-
Zarr v3.0 is limited to types length that are multiple of 8 bits but may open
536-
other values in later version of this specification.
549+
Zarr v3 is limited to type sizes that are a multiple of 8 bits but may support
550+
other type sizes in later versions of this specification.
537551

538552

539553
.. note::
540554

541555
We are explicitely looking for more feedback and prototypes of code using the ``r*``,
542-
raw bits, for various endianess and wether the spec coudl be made clearer.
556+
raw bits, for various endianness and whether the spec could be made clearer.
543557

544558
.. note::
545559

546-
currently only fixed size elements are supported as a core data type.
560+
Currently only fixed size elements are supported as a core data type.
547561
There are many request for variable length element encoding. There are many
548-
way to encode variable length and we want to keep flexibility. While we seem
562+
ways to encode variable length and we want to keep flexibility. While we seem
549563
to agree that for random access the most likely contender is to have two
550564
arrays, one with the actual variable length data and one with fixed size
551565
(pointer + length) to the variable size data we do not want to commit to such
@@ -626,10 +640,10 @@ is contained within the chunk at grid index (1, 7, 2) and has coordinates
626640

627641
The identifier for chunk with grid index (``i``, ``j``, ``k``, ...) is
628642
formed by joining together ASCII string representations of each index
629-
using a separator and prefixed with the string `'c'`. The default value for the separator is the slash
630-
character (by default ``/``), but this may be configured by providing a ``separator``
631-
value within the ``chunk_grid`` metadata object, see the section on
632-
`Array metadata`_ below.
643+
using a separator and prefixed with the character ``c``. The default value for
644+
the separator is the slash character lt ``/``, but this may be configured by
645+
providing a ``separator`` value within the ``chunk_grid`` metadata object (see
646+
the section on `Array metadata`_ below).
633647

634648
For example, in a 3 dimensional array, the identifier for the chunk at
635649
grid index (1, 23, 45) is the string "c1/23/45".
@@ -643,14 +657,14 @@ origin vertex of the array may occur at an arbitrary position within
643657
any chunk, which is required to allow arrays to be extended by an
644658
arbitrary length in a "negative" direction along any dimension.
645659

646-
.. note:: A main difference with spec v2 is the default chunk separator
647-
changed from ``.`` to ``/`` this help with compatibility with N5 as well as
648-
decrease the maximum number of items in hierarchical stores like directory
660+
.. note:: A main difference with spec v2 is that the default chunk separator
661+
changed from ``.`` to ``/``. This helps with compatibility with N5 as well as
662+
decreases the maximum number of items in hierarchical stores like directory
649663
stores.
650664

651665
.. note:: Arrays may have 0 dimension (when for example representing scalars),
652666
in which case the coordinate of a chunk is the empty tuple, and the chunk key
653-
will consist of the string `'c'`
667+
will consist of the string ``c``.
654668

655669
Chunk memory layouts
656670
====================
@@ -769,12 +783,12 @@ name/value pairs. This section also defines how metadata documents are
769783
encoded for storage.
770784

771785

772-
Only the top level metadata document ``zarr.json`` is guarantied to be json, and
773-
can be used to defined other format to array-level and group-level metadata
774-
document; in the case where non-json metadata document are use in a zarr
775-
hierarchy the following sections on group and array level metadata are
776-
non-normative; but other metadata format as expected to define some equivalence
777-
relations with the JSON documents.
786+
Only the top level metadata document ``zarr.json`` is guaranteed to be of JSON
787+
type, and can be used to define other formats for array-level and group-level
788+
metadata documents. In the case where non-JSON metadata documents are used in a
789+
Zarr hierarchy, the following sections on group and array level metadata are
790+
non-normative, but other metadata formats are expected to define some
791+
equivalence relations with the JSON documents.
778792

779793

780794
Entry point metadata
@@ -827,8 +841,8 @@ containing the following names:
827841

828842
.. note::
829843

830-
This suffix is used is used to allow non hierarchy
831-
browsing and edditign by non-zarr-aware tools.
844+
This suffix is used to allow non hierarchy
845+
browsing and editing by non-zarr-aware tools.
832846

833847
``extensions``
834848

@@ -914,7 +928,8 @@ following mandatory names:
914928
does not recognise the extension, but a ``fallback`` is present,
915929
then the implementation may proceed using the ``fallback`` value
916930
as the data type. For fallback types that do not correspond to base
917-
known types, extensions can fallback on on a raw number of bytes using
931+
known types, extensions can fallback on a raw number of bytes using
932+
the raw type (``r*``).
918933

919934
``chunk_grid``
920935

@@ -977,12 +992,13 @@ following mandatory names:
977992
the binary (starting with ``0b``) or hexadecimal value (starting with
978993
``0x``) is accepted. This string must include all leading or trailing
979994
zeroes necessary to match the given type size. The string values ``"NaN"``,
980-
``"+Infinity"`` and ``"-Infinity"`` are also understood for floating point datatypes.
995+
``"+Infinity"`` and ``"-Infinity"`` are also understood for floating point
996+
data types.
981997

982998
``extensions``
983999

9841000
See the top level metadata extension section for the time being.
985-
1001+
9861002

9871003
``attributes``
9881004

@@ -995,11 +1011,12 @@ The following names are optional:
9951011

9961012
Specifies a codec to be used for encoding and decoding chunks. The
9971013
value must be an object containing the name ``codec`` whose value
998-
is a URI that identifies a codec and dereferences to a human
999-
readable representation of the codec specification. The codec
1000-
object may also contain a ``configuration`` name whose value is
1001-
defined by the corresponding codec specification. When the key for this is
1002-
absent, this signor fies that no compressor has been used.
1014+
is a URI that identifies a codec and dereferences to a human-readable
1015+
representation of the codec specification. The codec
1016+
object may also contain a ``configuration`` object which consists of the
1017+
parameter names and values as defined by the corresponding codec
1018+
specification. When the ``compressor`` name is absent, this means that no
1019+
compressor is used.
10031020

10041021

10051022
All other names within the array metadata object are reserved for
@@ -1193,9 +1210,9 @@ operations:
11931210
For example, if a store contains the keys "a/b", "a/c/d" and
11941211
"e/f/g", then ``list_prefix("a/")`` would return "a/b" and "a/c/d".
11951212

1196-
Note behavior of ``list_prefix`` is undefined if ``prefix`` does not ends
1197-
with a trailing slash ``/`` and store can assume there is as least one key
1198-
that stars with prefix.
1213+
Note: the behavior of ``list_prefix`` is undefined if ``prefix`` does not end
1214+
with a trailing slash ``/`` and the store can assume there is at least one key
1215+
that starts with ``prefix``.
11991216

12001217
``list_dir`` - Retrieve all keys and prefixes with a given prefix and
12011218
which do not contain the character "/" after the given prefix.
@@ -1296,7 +1313,7 @@ For an array at a non-root hierarchy path `P`, the metadata key for
12961313
the array metadata document is formed by concatenating "meta/root",
12971314
`P`, ".array", and the metadata key suffix.
12981315

1299-
The data key for array chunks is formed by concatenating "data", `P`,
1316+
The data key for array chunks is formed by concatenating "data/root", `P`,
13001317
"/", and the chunk identifier as defined by the chunk grid layout.
13011318

13021319
To get the path ``P`` from a metadata key, remove the trailing
@@ -1398,16 +1415,16 @@ Let "+" be the string concatenation operator.
13981415
**Store element values in an array**
13991416

14001417
To store element in an array at path `P` and coordinate (`i`, `j`,
1401-
...) perform ``set(data_key(P, i, j, ...), value)``, where
1418+
...), perform ``set(data_key(P, i, j, ...), value)``, where
14021419
`value` is the serialisation of the corresponding chunk, encoded
14031420
according to the information in the array metadata stored under
14041421
the key ``array_meta_key(P)``.
14051422

14061423
**Retrieve element values in an array**
14071424

14081425
To retrieve element in an array at path `P` and coordinate (`i`,
1409-
`j`, ...) perform ``get(data_key(P, i, j, ...), value)``, where
1410-
`value` is the serialisation of the corresponding chunk, encoded
1426+
`j`, ...), perform ``get(data_key(P, i, j, ...), value)``. The returned
1427+
value is the serialisation of the corresponding chunk, encoded
14111428
according to the array metadata stored at ``array_meta_key(P)``.
14121429

14131430
**Discover children of a group**
@@ -1437,27 +1454,27 @@ Let "+" be the string concatenation operator.
14371454
**Discover all nodes in a hierarchy**
14381455

14391456
To discover all nodes in a hierarchy, one can call
1440-
``list("meta/")``. All keys represent either explicit group or
1457+
``list_prefix("meta/root/")``. All keys represent either explicit group or
14411458
arrays. All intermediate prefixes ending in a ``/`` are implicit
14421459
groups.
14431460

14441461
**Delete a group or array**
14451462

14461463
To delete an array at path `P`:
14471464
- delete the metadata document for the array, ``delete(array_meta_key(P))``
1448-
- delete all data keys which prefix have path pointing to this to this array,
1449-
``delete_prefix("data/root" + P + "/")``
1465+
- delete all data keys which prefix have path pointing to this array,
1466+
``delete_prefix("data/root" + P + "/")``
14501467

14511468
To delete an implicit group at path `P`:
14521469
- delete all nodes under this group - it should be sufficient to
1453-
perform ``delete_prefix("meta/root" + P + "/")`` and
1454-
``delete_prefix("data/root" + P + "/")``.
1470+
perform ``delete_prefix("meta/root" + P + "/")`` and
1471+
``delete_prefix("data/root" + P + "/")``.
14551472

14561473
To delete an explicit group at path `P`:
14571474
- delete the metadata document for the group, ``delete(group_meta_key(P))``
14581475
- delete all nodes under this group - it should be sufficient to
1459-
perform ``delete_prefix("meta/root" + P + "/")`` and
1460-
``delete_prefix("data/root" + P + "/")``.
1476+
perform ``delete_prefix("meta/root" + P + "/")`` and
1477+
``delete_prefix("data/root" + P + "/")``.
14611478

14621479
Note that store implementation may decide to reify implicit groups
14631480
and thus protocol implementation should attempt to delete the
@@ -1484,16 +1501,15 @@ Let "+" be the string concatenation operator.
14841501
Protocol extensions
14851502
===================
14861503

1487-
Many types of extensions can exists for a Zarr Protocol, they can be regrouped
1488-
in mostly 2 categories:
1504+
Many types of extensions can exist for a Zarr protocol, they can be regrouped
1505+
in mostly two categories:
14891506

1490-
- Core Datatypes Extensions – for example adding ability store fixed size
1491-
types like complex and datetime in chunks. These are directly declared in the
1492-
array metadata ``data_type`` keys.
1493-
- Arrays Extensions – Non rectilinear grids, and
1494-
variable length types.
1507+
- Core data type extensions – for example adding the ability to store fixed size
1508+
types such as complex or datetime in chunks. These are directly declared in the
1509+
array metadata ``data_type`` key.
1510+
- Arrays extensions – non rectilinear grids, and variable length types.
14951511

1496-
There are no group extensions as as Zarr v3.0
1512+
There are no group extensions in Zarr v3.0.
14971513

14981514
See https://github.com/zarr-developers/zarr-specs/issues/49 for a list of potential extensions
14991515

0 commit comments

Comments
 (0)