Skip to content

Commit b38f4a2

Browse files
committed
Merge 'ryans-review' into merge
2 parents 803e6d7 + 72e7f0a commit b38f4a2

File tree

1 file changed

+52
-45
lines changed

1 file changed

+52
-45
lines changed

docs/protocol/core/v3.0.rst

Lines changed: 52 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -144,11 +144,11 @@ Questions that still need to be resolved
144144
We solicit feedback on the following area during the RFC period of this first
145145
draft.
146146

147-
- Should core metadata and user attributes be stored together or separate documents? ([GH72](https://github.com/zarr-developers/zarr-specs/issues/72))
148-
large metadata documents.
147+
- Should core metadata and user attributes be stored together or separate documents?
148+
(See https://github.com/zarr-developers/zarr-specs/issues/72)
149149
- extensions and ``must_understand = True`` might be too restrictive. Work a
150-
draft implementation with extensions and
151-
see how far we can go. Possible list of extensions to implement:
150+
We propose to develop a draft implementation with extensions and
151+
see how far we can go. A possible list of extensions to include:
152152

153153
- Boolean
154154
- Complex
@@ -159,17 +159,18 @@ draft.
159159
See https://github.com/zarr-developers/zarr-specs/issues/89 for discussion on
160160
the topic.
161161

162-
- Node name case sensitivity: The node name is now case sensitive, this may
162+
- Node name case sensitivity: The node name is now case sensitive. This may
163163
make store implementation more complicated as some backends might not be
164164
(like some specific filesystem / object store), and we may want to
165165
recommend a standard escaping mechanism in those cases.
166166
https://github.com/zarr-developers/zarr-specs/issues/57
167167

168-
- Node name character set: Same as above but unlike the previous point where we
168+
- Node name character set: We
169169
solicit feedback on whether store implementation should support full unicode.
170170
https://github.com/zarr-developers/zarr-specs/issues/56
171171

172-
- Should named dimensions be part of the core metadata spec? https://github.com/zarr-developers/zarr-specs/issues/73
172+
- Should named dimensions be part of the core metadata spec?
173+
https://github.com/zarr-developers/zarr-specs/issues/73
173174

174175

175176
Document conventions
@@ -394,7 +395,7 @@ node names:
394395

395396
* must not be the empty string ("")
396397

397-
* must consist only of characters in the sets ``a-z``, ``A-Z``, ``0-9``,
398+
* must use only characters in the sets ``a-z``, ``A-Z``, ``0-9``,
398399
``-_.``
399400

400401
* must not be a string composed only of period characters, e.g. "." or
@@ -563,7 +564,7 @@ other type sizes in later versions of this specification.
563564
ways to encode variable length and we want to keep flexibility. While we seem
564565
to agree that for random access the most likely contender is to have two
565566
arrays, one with the actual variable length data and one with fixed size
566-
(pointer + length) to the variable size data we do not want to commit to such
567+
(pointer + length) to the variable size data, we do not want to commit to such
567568
a structure.
568569

569570

@@ -594,21 +595,21 @@ A regular grid is a type of grid where an array is divided into chunks
594595
such that each chunk is a hyperrectangle of the same shape. The
595596
dimensionality of the grid is the same as the dimensionality of the
596597
array. Each chunk in the grid can be addressed by a tuple of positive
597-
integers (`i`, `j`, `k`, ...) corresponding to the indices of the
598+
integers (`k`, `j`, `i`, ...) corresponding to the indices of the
598599
chunk along each dimension.
599600

600-
The origin vertex of a chunk has coordinates in the array space (`i` *
601-
`dx`, `j` * `dy`, `k` * `dz`, ...) where (`dx`, `dy`, `dz`, ...) are
602-
the grid spacings along each dimension, also known as the chunk
603-
shape. Thus the origin vertex of the chunk at grid index (0, 0, 0,
601+
The origin vertex of a chunk has coordinates in the array space (`k` *
602+
`dz`, `j` * `dy`, `i` * `dx`, ...) where (`dz`, `dy`, `dx`, ...) are
603+
the chunk sizes along each dimension.
604+
Thus the origin vertex of the chunk at grid index (0, 0, 0,
604605
...) is at coordinate (0, 0, 0, ...) in the array space, i.e., the
605606
grid is aligned with the origin of the array. If the length of any
606607
array dimension is not perfectly divisible by the chunk length along
607608
the same dimension, then the grid will overhang the edge of the array
608609
space.
609610

610-
The shape of the chunk grid will be (ceil(`x` / `dx`), ceil(`y` /
611-
`dy`), ceil(`z` / `dz`), ...) where (`x`, `y`, `z`, ...) is the array
611+
The shape of the chunk grid will be (ceil(`z` / `dz`), ceil(`y` /
612+
`dy`), ceil(`x` / `dx`), ...) where (`z`, `y`, `x`, ...) is the array
612613
shape, "/" is the division operator and "ceil" is the ceiling
613614
function. For example, if a 3 dimensional array has shape (10, 200,
614615
3000), and has chunk shape (5, 20, 400), then the shape of the chunk
@@ -628,18 +629,18 @@ dimension.
628629
- (2, 10, 8)
629630
- The grid does overhang the edge of the array on the 3rd dimension.
630631

631-
An element of an array with coordinates (`a`, `b`, `c`, ...) will
632-
occur within the chunk at grid index (`a` // `dx`, `b` // `dy`, `c` //
633-
`dz`, ...), where "//" is the floor division operator. The element
634-
will have coordinates (`a` % `dx`, `b` % `dy`, `c` % `dz`, ...) within
632+
An element of an array with coordinates (`c`, `b`, `a`, ...) will
633+
occur within the chunk at grid index (`c` // `dz`, `b` // `dy`, `a` //
634+
`dx`, ...), where "//" is the floor division operator. The element
635+
will have coordinates (`c` % `dz`, `b` % `dy`, `a` % `dx`, ...) within
635636
that chunk, where "%" is the modulo operator. For example, if a
636637
3 dimensional array has shape (10, 200, 3000), and has chunk shape
637638
(5, 20, 400), then the element of the array with coordinates (7, 150, 900)
638639
is contained within the chunk at grid index (1, 7, 2) and has coordinates
639640
(2, 10, 100) within that chunk.
640641

641642

642-
The identifier for chunk with grid index (``i``, ``j``, ``k``, ...) is
643+
The identifier for chunk with grid index (``k``, ``j``, ``i``, ...) is
643644
formed by joining together ASCII string representations of each index
644645
using a separator and prefixed with the character ``c``. The default value for
645646
the separator is the slash character, ``/``, but this may be configured by
@@ -693,10 +694,10 @@ organised into a sequence such that the last dimension of the array is
693694
the fastest changing dimension, also known as "row-major" order. This
694695
layout is only applicable to arrays with fixed size data types.
695696

696-
For example, for a two-dimensional array with chunk shape (`dx`, `dy`),
697+
For example, for a two-dimensional array with chunk shape (`dy`, `dx`),
697698
the binary values for a given chunk are taken from chunk elements in
698-
the order (0, 0), (0, 1), (0, 2), ..., (`dx` - 1, `dy` - 3), (`dx` - 1, `dy` -
699-
2), (`dx` - 1, `dy` - 1).
699+
the order (0, 0), (0, 1), (0, 2), ..., (`dy` - 1, `dx` - 3), (`dy` - 1, `dx` -
700+
2), (`dy` - 1, `dx` - 1).
700701

701702
F contiguous memory layout
702703
--------------------------
@@ -707,10 +708,10 @@ is the fastest changing dimension, also known as "column-major"
707708
order. This layout is only applicable to arrays with fixed size data
708709
types.
709710

710-
For example, for a two-dimensional array with chunk shape (`dx`,
711-
`dy`), the binary values for a given chunk are taken from chunk
712-
elements in the order (0, 0), (1, 0), (2, 0), ..., (`dx` - 3, `dy` -
713-
1), (`dx` - 2, `dy` - 1), (`dx` - 1, `dy` - 1).
711+
For example, for a two-dimensional array with chunk shape (`dy`,
712+
`dx`), the binary values for a given chunk are taken from chunk
713+
elements in the order (0, 0), (1, 0), (2, 0), ..., (`dy` - 3, `dx` -
714+
1), (`dy` - 2, `dx` - 1), (`dy` - 1, `dx` - 1).
714715

715716

716717
Chunk encoding
@@ -988,7 +989,7 @@ following mandatory names:
988989
then said extension is responsible for interpreting the value of
989990
``fill_value`` and return a suitable type that can be used.
990991

991-
For core ``data_type`` which ``fill_value`` are not permitted in JSON or
992+
For core data types for which fill values are not permitted in JSON or
992993
for which decimal representation could be lossy, a string representing of
993994
the binary (starting with ``0b``) or hexadecimal value (starting with
994995
``0x``) is accepted. This string must include all leading or trailing
@@ -1004,7 +1005,13 @@ following mandatory names:
10041005
``attributes``
10051006

10061007
The value must be an object. The object may contain any name/value
1007-
pairs.
1008+
pairs. Intended to allow storage of arbitrary user metadata
1009+
1010+
1011+
.. note:: The question of whether core metadata and user attributes should be
1012+
stored together or in separate documents is a topic of ongoing discussion.
1013+
(See https://github.com/zarr-developers/zarr-specs/issues/72.)
1014+
10081015

10091016
The following names are optional:
10101017

@@ -1084,11 +1091,11 @@ chunking as above, but using an extension data type::
10841091

10851092
.. note::
10861093
comparison with spec v2,
1087-
``dtype`` have been renamed to ``data_type``,
1088-
``chunks`` have been renamed to ``chunk_grid``,
1089-
``order`` have been renamed to ``chunk_memory_layout``,
1090-
``filters`` have been removed,
1091-
``zarr_format`` have been removed,
1094+
``dtype`` has been renamed to ``data_type``,
1095+
``chunks`` has been renamed to ``chunk_grid``,
1096+
``order`` has been renamed to ``chunk_memory_layout``,
1097+
``filters`` has been removed,
1098+
``zarr_format`` has been removed,
10921099

10931100

10941101
Group metadata
@@ -1110,7 +1117,7 @@ For example, the JSON document below defines an explicit group::
11101117

11111118
.. note::
11121119

1113-
Groups cannot have extensions attached to them as of spec v3.0 Allowing
1120+
Groups cannot have extensions attached to them as of spec v3.0. Allowing
11141121
groups to have extensions would force any implementation to sequentially
11151122
traverse the store hierarchy in order to check for extensions, which would
11161123
defeat the purpose of a flat namespace and concurrent access.
@@ -1119,7 +1126,7 @@ For example, the JSON document below defines an explicit group::
11191126

11201127
.. note::
11211128

1122-
A group does not need a metadata document to exists, see implicit groups.
1129+
A group does not need a metadata document to exist. (See implicit groups.)
11231130

11241131

11251132

@@ -1376,8 +1383,8 @@ concatenating "data/root/" and the chunk identifier.
13761383
- Chunk grid indices
13771384
- Data key
13781385
* - `/foo/baz`
1379-
- `(0, 0)`
1380-
- `data/root/foo/baz/c0/0`
1386+
- `(1, 0)`
1387+
- `data/root/foo/baz/c1/0`
13811388

13821389

13831390

@@ -1389,8 +1396,8 @@ Let `P` be an arbitrary hierarchy path.
13891396
Let ``array_meta_key(P)`` be the array metadata key for `P`. Let
13901397
``group_meta_key(P)`` be the group metadata key for `P`.
13911398

1392-
Let ``data_key(P, i, j, ...)`` be the data key for `P` for the chunk
1393-
with grid coordinates (`i`, `j`, ...).
1399+
Let ``data_key(P, j, i ...)`` be the data key for `P` for the chunk
1400+
with grid coordinates (`j`, `i`, ...).
13941401

13951402
Let "+" be the string concatenation operator.
13961403

@@ -1424,16 +1431,16 @@ Let "+" be the string concatenation operator.
14241431

14251432
**Store element values in an array**
14261433

1427-
To store element in an array at path `P` and coordinate (`i`, `j`,
1428-
...), perform ``set(data_key(P, i, j, ...), value)``, where
1434+
To store element in an array at path `P` and coordinate (`j`, `i`,
1435+
...), perform ``set(data_key(P, j, i, ...), value)``, where
14291436
`value` is the serialisation of the corresponding chunk, encoded
14301437
according to the information in the array metadata stored under
14311438
the key ``array_meta_key(P)``.
14321439

14331440
**Retrieve element values in an array**
14341441

14351442
To retrieve element in an array at path `P` and coordinate (`i`,
1436-
`j`, ...), perform ``get(data_key(P, i, j, ...), value)``. The returned
1443+
`j`, ...), perform ``get(data_key(P, j, i, ...), value)``. The returned
14371444
value is the serialisation of the corresponding chunk, encoded
14381445
according to the array metadata stored at ``array_meta_key(P)``.
14391446

@@ -1507,7 +1514,7 @@ in mostly two categories:
15071514
- Core data type extensions – for example adding the ability to store fixed size
15081515
types such as complex or datetime in chunks. These are directly declared in the
15091516
array metadata ``data_type`` key.
1510-
- Arrays extensions – non rectilinear grids, and variable length types.
1517+
- Array extensions – non rectilinear grids, and variable length types.
15111518

15121519
There are no group extensions in Zarr v3.0.
15131520

0 commit comments

Comments
 (0)