You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But Zarr data types and Numpy data types are also very different in one key respect:
17
-
Zarr arrays are designed to be persisted to storage and later read, possibly by Zarr implementations in different programming languages.
18
-
So in addition to defining a memory layout for array elements, each Zarr data type defines a procedure for
19
-
reading and writing that data type to Zarr array metadata, and also reading and writing **instances** of that data type to
16
+
But Zarr data types and Numpy data types are also very different in one key respect:
17
+
Zarr arrays are designed to be persisted to storage and later read, possibly by Zarr implementations in different programming languages.
18
+
So in addition to defining a memory layout for array elements, each Zarr data type defines a procedure for
19
+
reading and writing that data type to Zarr array metadata, and also reading and writing **instances** of that data type to
20
20
array metadata.
21
21
22
22
Data types in Zarr version 2
@@ -35,11 +35,11 @@ Thus the JSON identifer for a Numpy-compatible data type is just the Numpy ``str
35
35
>>> dtype_meta
36
36
<i8
37
37
38
-
.. note::
38
+
.. note::
39
39
The ``<`` character in the data type metadata encodes the `endianness https://numpy.org/doc/2.2/reference/generated/numpy.dtype.byteorder.html`_, or "byte order", of the data type. Following Numpy's example,
40
40
Zarr version 2 data types associate each data type with an endianness where applicable. Zarr version 3 data types do not store endianness information.
41
41
42
-
In addition to defining a representation of the data type itself (which in the example above was just a simple string ``"<i8"``, Zarr also
42
+
In addition to defining a representation of the data type itself (which in the example above was just a simple string ``"<i8"``, Zarr also
43
43
defines a metadata representation of scalars associated with that data type. Integers are stored as ``JSON`` numbers,
44
44
as are floats, with the caveat that `NaN`, positive infinity, and negative infinity are stored as special strings.
45
45
@@ -52,105 +52,105 @@ Data types in Zarr version 3
52
52
Data types in Zarr-Python
53
53
-------------------------
54
54
55
-
Zarr-Python supports two different Zarr formats, and those two formats specify data types in rather different ways:
56
-
data types in Zarr version 2 are encoded as Numpy-compatible strings, while data types in Zarr version 3 are encoded as either strings or ``JSON`` objects,
57
-
and the Zarr V3 data types don't have any associated endianness information, unlike Zarr V2 data types.
55
+
Zarr-Python supports two different Zarr formats, and those two formats specify data types in rather different ways:
56
+
data types in Zarr version 2 are encoded as Numpy-compatible strings, while data types in Zarr version 3 are encoded as either strings or ``JSON`` objects,
57
+
and the Zarr V3 data types don't have any associated endianness information, unlike Zarr V2 data types.
58
58
59
-
If that wasn't enough, we want Zarr-Python to support data types beyond what's available in Numpy. So it's crucial that we have a
59
+
If that wasn't enough, we want Zarr-Python to support data types beyond what's available in Numpy. So it's crucial that we have a
60
60
model of array data types that can adapt to the differences between Zarr V2 and V3 and doesn't over-fit to Numpy.
61
61
62
62
Here are the operations we need to perform on data types in Zarr-Python:
63
63
64
64
* Round-trip native data types to fields in array metadata documents.
65
-
For example, the Numpy data type ``np.dtype('>i2')`` should be saved as ``{..., "dtype" : ">i2"}`` in Zarr V2 metadata.
66
-
65
+
For example, the Numpy data type ``np.dtype('>i2')`` should be saved as ``{..., "dtype" : ">i2"}`` in Zarr V2 metadata.
66
+
67
67
In Zarr V3 metadata, the same Numpy data type would be saved as ``{..., "data_type": "int16", "codecs": [..., {"name": "bytes", "configuration": {"endian": "big"}, ...]}``
68
68
69
-
* Define a default fill value. This is not mandated by the Zarr specifications, but it's convenient for users
70
-
to have a useful default. For numeric types like integers and floats the default can be statically set to 0, but for
69
+
* Define a default fill value. This is not mandated by the Zarr specifications, but it's convenient for users
70
+
to have a useful default. For numeric types like integers and floats the default can be statically set to 0, but for
71
71
parametric data types like fixed-length strings the default can only be generated after the data type has been parametrized at runtime.
72
72
73
73
* Round-trip scalars to the ``fill_value`` field in Zarr V2 and V3 array metadata documents. The Zarr V2 and V3 specifications
74
74
define how scalars of each data type should be stored as JSON in array metadata documents, and in principle each data type
75
75
can define this encoding separately.
76
76
77
-
* Do all of the above for *user-defined data types*. Zarr-Python should support data types added as extensions,so we cannot
78
-
hard-code the list of data types. We need to ensure that users can easily (or easily enough) define a python object
79
-
that models their custom data type and register this object with Zarr-Python, so that the above operations all succeed for their
77
+
* Do all of the above for *user-defined data types*. Zarr-Python should support data types added as extensions,so we cannot
78
+
hard-code the list of data types. We need to ensure that users can easily (or easily enough) define a python object
79
+
that models their custom data type and register this object with Zarr-Python, so that the above operations all succeed for their
80
80
custom data type.
81
81
82
-
To achieve these goals, Zarr Python uses a class called :class:`zarr.core.dtype.DTypeWrapper` to wrap native data types. Each data type
83
-
supported by Zarr Python is modeled by a subclass of `DTypeWrapper`, which has the following structure:
82
+
To achieve these goals, Zarr Python uses a class called :class:`zarr.core.dtype.DTypeWrapper` to wrap native data types. Each data type
83
+
supported by Zarr Python is modeled by a subclass of `DTypeWrapper`, which has the following structure:
84
84
85
85
(attribute) ``dtype_cls``
86
86
^^^^^^^^^^^^^
87
87
The ``dtype_cls`` attribute is a **class variable** that is bound to a class that can produce
88
-
an instance of a native data type. For example, on the ``DTypeWrapper`` used to model the boolean
89
-
data type, the ``dtype_cls`` attribute is bound to the numpy bool data type class: ``np.dtypes.BoolDType``.
90
-
This attribute is used when we need to create an instance of the native data type, for example when
91
-
defining a Numpy array that will contain Zarr data.
88
+
an instance of a native data type. For example, on the ``DTypeWrapper`` used to model the boolean
89
+
data type, the ``dtype_cls`` attribute is bound to the numpy bool data type class: ``np.dtypes.BoolDType``.
90
+
This attribute is used when we need to create an instance of the native data type, for example when
91
+
defining a Numpy array that will contain Zarr data.
92
92
93
-
It might seem odd that ``DTypeWrapper.dtype_cls`` binds to a *class* that produces a native data type instead of an instance of that native data type --
93
+
It might seem odd that ``DTypeWrapper.dtype_cls`` binds to a *class* that produces a native data type instead of an instance of that native data type --
94
94
why not have a ``DTypeWrapper.dtype`` attribute that binds to ``np.dtypes.BoolDType()``? The reason why ``DTypeWrapper``
95
-
doesn't wrap a concrete data type instance is because data type instances may have endianness information, but Zarr V3
96
-
data types do not. To model Zarr V3 data types, we need endianness to be an **instance variable** which is
97
-
defined when creating an instance of the ```DTypeWrapper``. Subclasses of ``DTypeWrapper`` that model data types with
95
+
doesn't wrap a concrete data type instance is because data type instances may have endianness information, but Zarr V3
96
+
data types do not. To model Zarr V3 data types, we need endianness to be an **instance variable** which is
97
+
defined when creating an instance of the ```DTypeWrapper``. Subclasses of ``DTypeWrapper`` that model data types with
98
98
byte order semantics thus have ``endianness`` as an instance variable, and this value can be set when creating an instance of the wrapper.
99
99
100
100
101
101
(attribute) ``_zarr_v3_name``
102
102
^^^^^^^^^^^^^
103
-
The ``_zarr_v3_name`` attribute encodes the canonical name for a data type for Zarr V3. For many data types these names
103
+
The ``_zarr_v3_name`` attribute encodes the canonical name for a data type for Zarr V3. For many data types these names
104
104
are defined in the `Zarr V3 specification https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#data-types`_ For nearly all of the
105
105
data types defined in Zarr V3, this name can be used to uniquely specify a data type. The one exception is the ``r*`` data type,
106
-
which is parametrized by a number of bits, and so may take the form ``r8``, ``r16``, ... etc.
106
+
which is parametrized by a number of bits, and so may take the form ``r8``, ``r16``, ... etc.
107
107
108
108
(class method) ``from_dtype(cls, dtype) -> Self``
109
109
^^^^^^^^^
110
110
This method defines a procedure for safely converting a native dtype instance into an instance of ``DTypeWrapper``. It should perform
111
-
validation of its input to ensure that the native dtype is an instance of the ``dtype_cls`` class attribute, for example. For some
112
-
data types, additional checks are needed -- in Numpy "structured" data types and "void" data types use the same class, with different properties.
111
+
validation of its input to ensure that the native dtype is an instance of the ``dtype_cls`` class attribute, for example. For some
112
+
data types, additional checks are needed -- in Numpy "structured" data types and "void" data types use the same class, with different properties.
113
113
A ``DTypeWrapper`` that wraps Numpy structured data types must do additional checks to ensure that the input ``dtype`` is actually a structured data type.
114
-
If input validation succeeds, this method will call ``_from_dtype_unsafe``.
114
+
If input validation succeeds, this method will call ``_from_dtype_unsafe``.
This method defines the procedure for converting a native data type instance, like ``np.dtype('uint8')``,
119
-
into a wrapper class instance. The ``unsafe`` prefix on the method name denotes that this method should not
120
-
perform any input validation. Input validation should be done by the routine that calls this method.
119
+
into a wrapper class instance. The ``unsafe`` prefix on the method name denotes that this method should not
120
+
perform any input validation. Input validation should be done by the routine that calls this method.
121
121
122
122
For many data types, creating the wrapper class takes no arguments and so this method can just return ``cls()``.
123
-
But for data types with runtime attributes like endianness or length (for fixed-size strings), this ``_from_dtype_unsafe``
123
+
But for data types with runtime attributes like endianness or length (for fixed-size strings), this ``_from_dtype_unsafe``
124
124
ensures that those attributes of ``dtype`` are mapped on to the correct parameters in the ``DTypeWrapper`` class constructor.
125
125
126
126
(method) ``to_dtype(self) -> dtype``
127
127
^^^^^^^
128
-
This method produces a native data type consistent with the properties of the ``DTypeWrapper``. Together
128
+
This method produces a native data type consistent with the properties of the ``DTypeWrapper``. Together
129
129
with ``from_dtype``, this method allows round-trip conversion of a native data type in to a wrapper class and then out again.
130
130
131
131
That is, for some ``DTypeWrapper`` class ``FooWrapper`` that wraps a native data type called ``foo``, ``FooWrapper.from_dtype(instance_of_foo).to_dtype() == instance_of_foo`` should be true.
132
132
133
-
(method) ``to_dict(self) -> dict``
133
+
(method) ``to_dict(self) -> dict``
134
134
^^^^^
135
-
This method generates a JSON-serialiazable representation of the wrapped data type which can be stored in
135
+
This method generates a JSON-serialiazable representation of the wrapped data type which can be stored in
0 commit comments