@@ -1089,6 +1089,36 @@ each of these two representations are defined to be either:
10891089- a multi-dimensional array of some shape and data type, or
10901090- a byte string.
10911091
1092+ Based on the input and output representations for the encode transform,
1093+ codecs can be classified as one of three kinds:
1094+
1095+ - ``array -> array ``
1096+ - ``array -> bytes ``
1097+ - ``bytes -> bytes ``
1098+
1099+ .. note ::
1100+
1101+ ``bytes -> array `` codecs, where after encoding an array as a byte
1102+ string, it is subsequently transformed back into an array, to then later
1103+ be transformed back into a byte string, are not currently allowed, due to
1104+ the lack of a clear use case.
1105+
1106+ If multiple codecs are specified for an array, each codec is applied
1107+ sequentially; when encoding, the encoded output of codec ``i `` serves as the
1108+ decoded input of codec ``i+1 ``, and similarly when decoding, the decoded output
1109+ of codec ``i+1 `` serves as the encoded input to codec ``i ``. Since ``bytes ->
1110+ array `` codecs are not supported, it follows that the list of codecs must be of
1111+ the following form:
1112+
1113+ - zero or more ``array -> array `` codecs; followed by
1114+ - at most one ``array -> bytes `` codec; followed by
1115+ - zero or more ``bytes -> bytes `` codecs.
1116+
1117+ If no ``array -> bytes `` codec is specified, then the default byte
1118+ representation for the data type of the array is used. For all data types
1119+ currently defined by the core spec, that is equivalent to the ``endian `` codec
1120+ with an endianness of ``little ``.
1121+
10921122Logically, a codec ``c `` must define three properties:
10931123
10941124- ``c.compute_encoded_representation_type(decoded_representation_type) ``, a
@@ -1106,11 +1136,31 @@ Logically, a codec ``c`` must define three properties:
11061136- ``c.decode(encoded_value, decoded_representation_type) ``, a procedure that
11071137 computes the decoded representation, and is used when reading an array.
11081138
1109- If more than one codec is specified for an array, each codec is applied
1110- sequentially; when encoding, the encoded output of codec ``i `` serves as the
1111- decoded input of codec ``i+1 ``, and similarly when decoding, the decoded
1112- output of codec ``i+1 `` serves as the encoded input to codec ``i ``.
1139+ Implementations MAY support partial decoding for certain codecs
1140+ (e.g. sharding, blosc). Logically, partial decoding may be defined in terms
1141+ of an additional operation:
1142+
1143+ - ``c.partial_decode(input_handle, decoded_representation_type,
1144+ decoded_regions) ``, where:
1145+
1146+ - ``input_handle `` provides an interface for requesting partial reads of
1147+ the encoded representation and itself supports the same
1148+ ``partial_decode `` interface;
1149+ - ``decoded_representation_type `` is the same as for ``c.decode ``;
1150+ - ``decoded_regions `` specifies the regions of the decoded representation
1151+ that must be returned.
1152+
1153+ If the encoded representation is a multi-dimensional array, then
1154+ ``decoded_regions `` specifies a subset of the array's domain. If the
1155+ encoded representation is a byte string, then ``decoded_regions ``
1156+ specifies a list of byte ranges.
1157+
1158+ .. note ::
11131159
1160+ If ``partial_decode `` is not supported by a particular codec, it can
1161+ always be implemented in terms of ``decode `` by simply decoding in full
1162+ and then satisfying any ``decoded_regions `` requests directly from the
1163+ cached decoded representation.
11141164
11151165Determination of encoded representations
11161166----------------------------------------
0 commit comments