@@ -269,12 +269,10 @@ The following figure illustrates the first part of the terminology:
269269*Data type *
270270
271271 A data type defines the set of possible values that an array _ may
272- contain, and a default binary representation (i.e., sequence of bytes) for
273- each possible value. For example, the 32-bit signed
274- integer data type defines binary representations for all integers
275- in the range −2,147,483,648 to 2,147,483,647. This specification
276- only defines a limited set of data types, but extensions
277- may define other data types.
272+ contain. For example, the 32-bit signed integer data type defines binary
273+ representations for all integers in the range −2,147,483,648 to
274+ 2,147,483,647. This specification only defines a limited set of data types,
275+ but extensions may define other data types.
278276
279277.. _chunk :
280278.. _chunks :
@@ -655,6 +653,17 @@ mandatory names:
655653 the data type will be chosen. However, the default fill value that is
656654 chosen MUST be recorded in the metadata.
657655
656+ ``codecs ``
657+ ^^^^^^^^^^
658+
659+ Specifies a list of codecs to be used for encoding and decoding chunks. The
660+ value must be an array of objects, each object containing a member with
661+ ``name `` whose value is a string referring to a v3 codec specification. The
662+ codec object may also contain a ``configuration `` object which consists of
663+ the parameter names and values as defined by the corresponding codec
664+ specification. Since an ``array -> bytes `` codec must be specified, the
665+ list cannot be empty.
666+
658667The following members are optional:
659668
660669``attributes ``
@@ -673,17 +682,6 @@ The following members are optional:
673682 A proposal to specify metadata conventions (ZEP 4) is being discussed in
674683 https://github.com/zarr-developers/zeps/pull/28.
675684
676- ``codecs ``
677- ^^^^^^^^^^
678-
679- Specifies a list of codecs to be used for encoding and decoding chunks. The
680- value must be an array of objects, each object containing a member with
681- ``name `` whose value is a string referring to a v3 codec specification. The
682- codec object may also contain a ``configuration `` object which consists of
683- the parameter names and values as defined by the corresponding codec
684- specification. An absent ``codecs `` member is equivalent to specifying an
685- empty list of codecs.
686-
687685``storage_transformers ``
688686^^^^^^^^^^^^^^^^^^^^^^^^
689687
@@ -936,52 +934,36 @@ Core data types
936934
937935 * - Identifier
938936 - Numerical type
939- - Default binary representation
940937 * - ``bool ``
941938 - Boolean
942- - Single byte, with false encoded as ``\\x00 `` and true encoded as ``\\x01 ``.
943939 * - ``int8 ``
944940 - Integer in ``[-2^7, 2^7-1] ``
945- - 1 byte two's complement
946941 * - ``int16 ``
947942 - Integer in ``[-2^15, 2^15-1] ``
948- - 2-byte little endian two's complement
949943 * - ``int32 ``
950944 - Integer in ``[-2^31, 2^31-1] ``
951- - 4-byte little endian two's complement
952945 * - ``int64 ``
953946 - Integer in ``[-2^63, 2^63-1] ``
954- - 8-byte little endian two's complement
955947 * - ``uint8 ``
956948 - Integer in ``[0, 2^8-1] ``
957- - 1 byte
958949 * - ``uint16 ``
959950 - Integer in ``[0, 2^16-1] ``
960- - 2-byte little endian
961951 * - ``uint32 ``
962952 - Integer in ``[0, 2^32-1] ``
963- - 4-byte little endian
964953 * - ``uint64 ``
965954 - Integer in ``[0, 2^64-1] ``
966- - 8-byte little endian
967955 * - ``float16 `` (optionally supported)
968956 - IEEE 754 half-precision floating point: sign bit, 5 bits exponent, 10 bits mantissa
969- - 2-byte little endian IEEE 754 binary16
970957 * - ``float32 ``
971958 - IEEE 754 single-precision floating point: sign bit, 8 bits exponent, 23 bits mantissa
972- - 4-byte little endian IEEE 754 binary32
973959 * - ``float64 ``
974960 - IEEE 754 double-precision floating point: sign bit, 11 bits exponent, 52 bits mantissa
975- - 8-byte little endian IEEE 754 binary64
976961 * - ``complex64 ``
977962 - real and complex components are each IEEE 754 single-precision floating point
978- - 2 consecutive 4-byte little endian IEEE 754 binary32 values
979963 * - ``complex128 ``
980964 - real and complex components are each IEEE 754 double-precision floating point
981- - 2 consecutive 8-byte little endian IEEE 754 binary64 values
982965 * - ``r* `` (Optional)
983966 - raw bits, use for extension type fallbacks
984- - variable, given by ``* ``, is limited to be a multiple of 8.
985967
986968Additionally to these base types, an implementation should also handle the
987969raw/opaque pass-through type designated by the lower-case letter ``r `` followed
@@ -991,11 +973,6 @@ should be understood as fall-back types of respectively 1, 2, and 3 byte length.
991973Zarr v3 is limited to type sizes that are a multiple of 8 bits but may support
992974other type sizes in later versions of this specification.
993975
994- .. note ::
995-
996- While the default binary representation is little endian, the :ref: `endian
997- codec<endian-codec-v1>` may be specified to use big endian encoding instead.
998-
999976.. note ::
1000977
1001978 We are explicitly looking for more feedback and prototypes of code using the ``r* ``,
@@ -1111,7 +1088,7 @@ the chain of codecs_ specified by the ``codecs`` metadata field.
11111088Codecs
11121089------
11131090
1114- An array _ may be associated with a list of *codecs *. Each codec specifies a
1091+ An array _ has an associated list of *codecs *. Each codec specifies a
11151092bidirectional transform (an *encode * transform and a *decode * transform).
11161093
11171094Each codec has an *encoded representation * and a *decoded representation *;
@@ -1142,14 +1119,9 @@ array`` codecs are not supported, it follows that the list of codecs must be of
11421119the following form:
11431120
11441121- zero or more ``array -> array `` codecs; followed by
1145- - at most one ``array -> bytes `` codec; followed by
1122+ - exactly one ``array -> bytes `` codec; followed by
11461123- zero or more ``bytes -> bytes `` codecs.
11471124
1148- If no ``array -> bytes `` codec is specified, then the default byte
1149- representation for the data type of the array is used. For all data types
1150- currently defined by the core spec, that is equivalent to the ``endian `` codec
1151- with an endianness of ``little ``.
1152-
11531125Logically, a codec ``c `` must define three properties:
11541126
11551127- ``c.compute_encoded_representation_type(decoded_representation_type) ``, a
@@ -1224,27 +1196,6 @@ codec in the chain must first be determined as follows:
12241196 If ``compute_encoded_representation_type `` fails because of an incompatible
12251197 decoded representation, an implementation should indicate an error.
12261198
1227- .. _default-array-byte-string-conversion :
1228-
1229- Conversion between multi-dimensional array and byte string representations
1230- --------------------------------------------------------------------------
1231-
1232- Some codecs operate directly on multi-dimensional arrays of elements,
1233- e.g. encoding a 3-d array as a multi-channel jpeg image. Other codecs operate
1234- at the byte level, e.g. gzip compression. If a codec that operates at the byte
1235- level receives as input an array that is not a 1-dimensional uint8 array, it may
1236- convert the input array to a byte string by concatenating the default binary
1237- representations of each element in lexicographical order (C order). Similarly,
1238- if a codec that expects a multi-dimensional array as input instead receives a
1239- byte string, it may decode each element in lexicographical order according to
1240- the default binary representation of each element.
1241-
1242- .. note ::
1243-
1244- To encode elements in a different order than the default lexicographical
1245- order (C order/row major), the :ref: `transpose codec<transpose-codec-v1> ` may
1246- be specified.
1247-
12481199.. _encoding_procedure :
12491200
12501201Encoding procedure
@@ -1260,11 +1211,9 @@ the following procedure:
126012112. For each codec ``codecs[i] `` in ``codecs ``, ``EC[i+1] :=
12611212 codecs[i].encode(EC[i]) ``.
12621213
1263- 3. The final encoded chunk representation ``EC_final `` is always a byte string.
1264- If ``EC[codecs.length] `` is a byte string, then ``EC_final :=
1265- EC[codecs.length] ``. Otherwise, ``EC_final `` is
1266- :ref: `converted<default-array-byte-string-conversion> ` from
1267- ``EC[codecs.length] ``.
1214+ 3. The final encoded chunk representation ``EC_final := EC[codecs.length] ``.
1215+ This is always a byte string due to the requirement that the list of codecs
1216+ include an ``array -> bytes `` codec.
12681217
126912184. ``EC_final `` is written to the store _.
12701219
@@ -1278,9 +1227,7 @@ the following procedure:
12781227
127912281. The encoded chunk representation ``EC_final `` is read from the store _.
12801229
1281- 2. If ``codecs[codecs.length] `` is a byte string, ``EC[codecs.length] :=
1282- EC_final ``. Otherwise, ``EC[codecs.length] `` is
1283- :ref: `converted<default-array-byte-string-conversion> ` from ``EC_final ``.
1230+ 2. ``EC[codecs.length] := EC_final ``.
12841231
128512323. For each codec ``codecs[i] `` in ``codecs ``, iterating in reverse order,
12861233 ``EC[i] := codecs[i].decode(EC[i+1], decoded_representation[i]) ``.
@@ -1808,6 +1755,16 @@ All notable and possibly implementation-affecting changes to this specification
18081755are documented in this section, grouped by the specification status and ordered
18091756by time.
18101757
1758+ Changes after Provisional Acceptance
1759+ ------------------------------------
1760+
1761+ - It is now required to specify an ``array -> bytes `` codec in the ``codecs ``
1762+ array metadata field. `PR #249
1763+ <https://github.com/zarr-developers/zarr-specs/pull/249> `_
1764+ - The representation of fill values for floating point numbers was changed to
1765+ avoid ambiguity. `PR #236
1766+ <https://github.com/zarr-developers/zarr-specs/pull/236> `_
1767+
18111768Draft Changes
18121769--------------------------
18131770
0 commit comments