Skip to content

Commit b7a231e

Browse files
committed
update docs
1 parent b22f324 commit b7a231e

File tree

2 files changed

+59
-34
lines changed

2 files changed

+59
-34
lines changed

docs/user-guide/data_types.rst

Lines changed: 45 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -6,24 +6,24 @@ Zarr's data type model
66

77
Every Zarr array has a "data type", which defines the meaning and physical layout of the
88
array's elements. Zarr is heavily influenced by `NumPy <https://numpy.org/doc/stable/>`_, and
9-
Zarr arrays can use many of the same data types as numpy arrays::
9+
Zarr-Python supports creating arrays with Numpy data types::
1010
>>> import zarr
1111
>>> import numpy as np
1212
>>> zarr.create_array(store={}, shape=(10,), dtype=np.dtype('uint8'))
1313
>>> z
1414
<Array memory://126225407345920 shape=(10,) dtype=uint8>
1515

16-
But Zarr data types and Numpy data types are also very different in one key respect:
17-
Zarr arrays are designed to be persisted to storage and later read, possibly by Zarr implementations in different programming languages.
18-
So in addition to defining a memory layout for array elements, each Zarr data type defines a procedure for
16+
But Zarr data types and Numpy data types are also very different:
17+
Unlike Numpy arrays, Zarr arrays are designed to be persisted to storage and read by Zarr implementations in different programming languages.
18+
To ensure that the data type can be interpreted correctly when reading an array, each Zarr data type defines a procedure for
1919
reading and writing that data type to Zarr array metadata, and also reading and writing **instances** of that data type to
20-
array metadata.
20+
array metadata, and these serialization procedures depend on the Zarr format.
2121

2222
Data types in Zarr version 2
2323
-----------------------------
2424

2525
Version 2 of the Zarr format defined its data types relative to `Numpy's data types <https://numpy.org/doc/2.1/reference/arrays.dtypes.html#data-type-objects-dtype>`_, and added a few non-Numpy data types as well.
26-
Thus the JSON identifer for a Numpy-compatible data type is just the Numpy ``str`` attribute of that dtype:
26+
Thus the JSON identifier for a Numpy-compatible data type is just the Numpy ``str`` attribute of that dtype:
2727

2828
>>> import zarr
2929
>>> import numpy as np
@@ -113,16 +113,6 @@ data types, additional checks are needed -- in Numpy "structured" data types and
113113
A ``DTypeWrapper`` that wraps Numpy structured data types must do additional checks to ensure that the input ``dtype`` is actually a structured data type.
114114
If input validation succeeds, this method will call ``_from_dtype_unsafe``.
115115

116-
(class method) ``_from_dtype_unsafe(cls, dtype) -> Self``
117-
^^^^^^^^^^
118-
This method defines the procedure for converting a native data type instance, like ``np.dtype('uint8')``,
119-
into a wrapper class instance. The ``unsafe`` prefix on the method name denotes that this method should not
120-
perform any input validation. Input validation should be done by the routine that calls this method.
121-
122-
For many data types, creating the wrapper class takes no arguments and so this method can just return ``cls()``.
123-
But for data types with runtime attributes like endianness or length (for fixed-size strings), this ``_from_dtype_unsafe``
124-
ensures that those attributes of ``dtype`` are mapped on to the correct parameters in the ``DTypeWrapper`` class constructor.
125-
126116
(method) ``to_dtype(self) -> dtype``
127117
^^^^^^^
128118
This method produces a native data type consistent with the properties of the ``DTypeWrapper``. Together
@@ -137,20 +127,56 @@ Zarr metadata.
137127

138128
(method) ``cast_value(self, value: object) -> scalar``
139129
^^^^^
140-
Cast a python object to an instance of the wrapped data type. This is used for generating the default
130+
This method converts a python object to an instance of the wrapped data type. It is used for generating the default
141131
value associated with this data type.
142132

143133

144134
(method) ``default_value(self) -> scalar``
145135
^^^^
146-
Return the default value for the wrapped data type. Zarr-Python uses this method to generate a default fill value
136+
This method returns the default value for the wrapped data type. Zarr-Python uses this method to generate a default fill value
147137
for an array when a user has not requested one.
148138

149139
Why is this a method and not a static attribute? Although some data types
150140
can have a static default value, parametrized data types like fixed-length strings or structured data types cannot. For these data types,
151141
a default value must be calculated based on the attributes of the wrapped data type.
152142

153-
(method) ``check_dtype(cls, dtype)``
143+
(class method) ``check_dtype(cls, dtype) -> bool``
144+
^^^^^
145+
This class method checks if a native dtype is compatible with the ``DTypeWrapper`` class. It returns ``True``
146+
if ``dtype`` is compatible with the wrapper class, and ``False`` otherwise. For many data types, this check is as simple
147+
as checking that ``cls.dtype_cls`` matches ``type(dtype)``, i.e. checking that the data type class wrapped
148+
by the ``DTypeWrapper`` is the same as the class of ``dtype``. But there are some data types where this check alone is not sufficient,
149+
in which case this method is overridden so that additional properties of ``dtype`` can be inspected and compared with
150+
the expectations of ``cls``.
151+
152+
(class method) ``from_dict(cls, dtype) -> Self``
153+
^^^^
154+
This class method creates a ``DTypeWrapper`` from an appropriately structured dictionary. The default
155+
implementation first checks that the dictionary has the correct structure, and then uses its data
156+
to instantiate the ``DTypeWrapper`` instance.
157+
158+
(method) ``to_dict(self) -> dict[str, JSON]``
159+
^^^
160+
Returns a dictionary form of the wrapped data type. This is used prior to writing array metadata.
154161

162+
(class method) ``get_name(self, zarr_format: Literal[2, 3]) -> str``
163+
^^^^
164+
This method generates a name for the wrapped data type, depending on the Zarr format. If ``zarr_format`` is
165+
2 and the wrapped data type is a Numpy data type, then the Numpy string representation of that data type is returned.
166+
If ``zarr_format`` is 3, then the Zarr V3 name for the wrapped data type is returned. For most data types
167+
the Zarr V3 name will be stored as the ``_zarr_v3_name`` class attribute, but for parametric data types the
168+
name must be computed at runtime based on the parameters of the data type.
169+
170+
171+
(method) ``to_json_value(self, data: scalar, zarr_format: Literal[2, 3]) -> JSON``
172+
^^^
173+
This method converts a scalar instance of the data type into a JSON-serialiazable value.
174+
For some data types like bool and integers this conversion is simple -- just return a JSON boolean
175+
or number -- but other data types define a JSON serialization for scalars that is a bit more involved.
176+
And this JSON serialization depends on the Zarr format.
177+
178+
(method) ``from_json_value(self, data: JSON, zarr_format: Literal[2, 3]) -> scalar``
179+
^^^
180+
Convert a JSON-serialiazed scalar to a native scalar. This inverts the operation of ``to_json_value``.
155181

156182

src/zarr/core/dtype/wrapper.py

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,6 @@ def from_dtype(cls: type[Self], dtype: TDType) -> Self:
6565
f"Invalid dtype: {dtype}. Expected an instance of {cls.dtype_cls}."
6666
)
6767

68-
6968
@classmethod
7069
@abstractmethod
7170
def _from_dtype_unsafe(cls: type[Self], dtype: TDType) -> Self:
@@ -96,18 +95,6 @@ def to_dtype(self: Self) -> TDType:
9695
"""
9796
raise NotImplementedError
9897

99-
@abstractmethod
100-
def to_dict(self) -> dict[str, JSON]:
101-
"""
102-
Convert the wrapped data type to a dictionary.
103-
104-
Returns
105-
-------
106-
dict[str, JSON]
107-
The dictionary representation of the wrapped data type
108-
"""
109-
raise NotImplementedError
110-
11198
def cast_value(self: Self, value: object) -> TScalar:
11299
"""
113100
Cast a value to an instance of the scalar type.
@@ -178,6 +165,18 @@ def check_dict(cls: type[Self], data: dict[str, JSON]) -> TypeGuard[dict[str, JS
178165
"""
179166
return "name" in data and data["name"] == cls._zarr_v3_name
180167

168+
@abstractmethod
169+
def to_dict(self) -> dict[str, JSON]:
170+
"""
171+
Convert the wrapped data type to a dictionary.
172+
173+
Returns
174+
-------
175+
dict[str, JSON]
176+
The dictionary representation of the wrapped data type
177+
"""
178+
raise NotImplementedError
179+
181180
@classmethod
182181
def from_dict(cls: type[Self], data: dict[str, JSON]) -> Self:
183182
"""
@@ -194,11 +193,11 @@ def from_dict(cls: type[Self], data: dict[str, JSON]) -> Self:
194193
The wrapped data type.
195194
"""
196195
if cls.check_dict(data):
197-
return cls._from_json_unsafe(data)
196+
return cls._from_dict_unsafe(data)
198197
raise DataTypeValidationError(f"Invalid JSON representation of data type {cls}.")
199198

200199
@classmethod
201-
def _from_json_unsafe(cls: type[Self], data: dict[str, JSON]) -> Self:
200+
def _from_dict_unsafe(cls: type[Self], data: dict[str, JSON]) -> Self:
202201
"""
203202
Wrap a JSON representation of a data type.
204203

0 commit comments

Comments
 (0)