Skip to content

Commit 1457e40

Browse files
authored
Merge branch 'main' into codec-docstrings
2 parents 8a65204 + 702f7b3 commit 1457e40

File tree

17 files changed

+187
-76
lines changed

17 files changed

+187
-76
lines changed

changes/3264.fix.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
- Expand the range of types accepted by ``parse_data_type`` to include strings and Sequences.
2+
- Move the functionality of ``parse_data_type`` to a new function called ``parse_dtype``. This change
3+
ensures that nomenclature is consistent across the codebase. ``parse_data_type`` remains, so this
4+
change is not breaking.

changes/3273.doc.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Add a section on codecs to the migration guide.

changes/3280.fix.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Fix a regression introduced in 3.1.0 that prevented ``inf``, ``-inf``, and ``nan`` values
2+
from being stored in ``attributes``.

docs/user-guide/data_types.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -412,17 +412,17 @@ attempt data type resolution against *every* data type class, and if, for some r
412412
type matches multiple Zarr data types, we treat this as an error and raise an exception.
413413

414414
If you have a NumPy data type and you want to get the corresponding ``ZDType`` instance, you can use
415-
the ``parse_data_type`` function, which will use the dynamic resolution described above. ``parse_data_type``
415+
the ``parse_dtype`` function, which will use the dynamic resolution described above. ``parse_dtype``
416416
handles a range of input types:
417417

418418
- NumPy data types:
419419

420420
.. code-block:: python
421421
422422
>>> import numpy as np
423-
>>> from zarr.dtype import parse_data_type
423+
>>> from zarr.dtype import parse_dtype
424424
>>> my_dtype = np.dtype('>M8[10s]')
425-
>>> parse_data_type(my_dtype, zarr_format=2)
425+
>>> parse_dtype(my_dtype, zarr_format=2)
426426
DateTime64(endianness='big', scale_factor=10, unit='s')
427427
428428
@@ -431,7 +431,7 @@ handles a range of input types:
431431
.. code-block:: python
432432
433433
>>> dtype_str = '>M8[10s]'
434-
>>> parse_data_type(dtype_str, zarr_format=2)
434+
>>> parse_dtype(dtype_str, zarr_format=2)
435435
DateTime64(endianness='big', scale_factor=10, unit='s')
436436
437437
- ``ZDType`` instances:
@@ -440,7 +440,7 @@ handles a range of input types:
440440
441441
>>> from zarr.dtype import DateTime64
442442
>>> zdt = DateTime64(endianness='big', scale_factor=10, unit='s')
443-
>>> parse_data_type(zdt, zarr_format=2) # Use a ZDType (this is a no-op)
443+
>>> parse_dtype(zdt, zarr_format=2) # Use a ZDType (this is a no-op)
444444
DateTime64(endianness='big', scale_factor=10, unit='s')
445445
446446
- Python dictionaries (requires ``zarr_format=3``). These dictionaries must be consistent with the
@@ -449,7 +449,7 @@ handles a range of input types:
449449
.. code-block:: python
450450
451451
>>> dt_dict = {"name": "numpy.datetime64", "configuration": {"unit": "s", "scale_factor": 10}}
452-
>>> parse_data_type(dt_dict, zarr_format=3)
452+
>>> parse_dtype(dt_dict, zarr_format=3)
453453
DateTime64(endianness='little', scale_factor=10, unit='s')
454-
>>> parse_data_type(dt_dict, zarr_format=3).to_json(zarr_format=3)
454+
>>> parse_dtype(dt_dict, zarr_format=3).to_json(zarr_format=3)
455455
{'name': 'numpy.datetime64', 'configuration': {'unit': 's', 'scale_factor': 10}}

docs/user-guide/v3_migration.rst

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ the following actions in order:
5858
vendor the parts of the specific modules that you need.
5959

6060
* ``zarr.attrs`` has gone, with no replacement
61-
* ``zarr.codecs`` has gone, use ``numcodecs`` instead
61+
* ``zarr.codecs`` has changed, see "Codecs" section below for more information
6262
* ``zarr.context`` has gone, with no replacement
6363
* ``zarr.core`` remains but should be considered private API
6464
* ``zarr.hierarchy`` has gone, with no replacement (use ``zarr.Group`` inplace of ``zarr.hierarchy.Group``)
@@ -178,6 +178,18 @@ If you are interested in developing a custom store that targets these backends,
178178
:ref:`developing custom stores <user-guide-custom-stores>` or open an
179179
`issue <https://github.com/zarr-developers/zarr-python/issues>`_ to discuss your use case.
180180

181+
182+
Codecs
183+
~~~~~~
184+
Codecs defined in ``numcodecs`` (and also imported into the ``zarr.codecs`` namespace in Zarr-Python 2)
185+
should still be used when creating Zarr format 2 arrays.
186+
187+
Codecs for creating Zarr format 3 arrays are available in two locations:
188+
189+
- `zarr.codecs` contains Zarr format 3 codecs that are defined in the `codecs section of the Zarr format 3 specification <https://zarr-specs.readthedocs.io/en/latest/v3/codecs/index.html>`_.
190+
- `numcodecs.zarr3` contains codecs from ``numcodecs`` that can be used to create Zarr format 3 arrays, but are not necessarily part of the Zarr format 3 specification.
191+
192+
181193
Dependencies
182194
~~~~~~~~~~~~
183195

src/zarr/core/array.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@
7272
VariableLengthUTF8,
7373
ZDType,
7474
ZDTypeLike,
75-
parse_data_type,
75+
parse_dtype,
7676
)
7777
from zarr.core.dtype.common import HasEndianness, HasItemSize, HasObjectCodec
7878
from zarr.core.indexing import (
@@ -617,7 +617,7 @@ async def _create(
617617
Deprecated in favor of :func:`zarr.api.asynchronous.create_array`.
618618
"""
619619

620-
dtype_parsed = parse_data_type(dtype, zarr_format=zarr_format)
620+
dtype_parsed = parse_dtype(dtype, zarr_format=zarr_format)
621621
store_path = await make_store_path(store)
622622

623623
shape = parse_shapelike(shape)
@@ -4238,7 +4238,7 @@ async def init_array(
42384238

42394239
from zarr.codecs.sharding import ShardingCodec, ShardingCodecIndexLocation
42404240

4241-
zdtype = parse_data_type(dtype, zarr_format=zarr_format)
4241+
zdtype = parse_dtype(dtype, zarr_format=zarr_format)
42424242
shape_parsed = parse_shapelike(shape)
42434243
chunk_key_encoding_parsed = _parse_chunk_key_encoding(
42444244
chunk_key_encoding, zarr_format=zarr_format

src/zarr/core/dtype/__init__.py

Lines changed: 59 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
from __future__ import annotations
22

3+
from collections.abc import Sequence
34
from typing import TYPE_CHECKING, Final, TypeAlias
45

56
from zarr.core.dtype.common import (
@@ -94,6 +95,7 @@
9495
"ZDType",
9596
"data_type_registry",
9697
"parse_data_type",
98+
"parse_dtype",
9799
]
98100

99101
data_type_registry = DataTypeRegistry()
@@ -188,22 +190,26 @@ def parse_data_type(
188190
zarr_format: ZarrFormat,
189191
) -> ZDType[TBaseDType, TBaseScalar]:
190192
"""
191-
Interpret the input as a ZDType instance.
193+
Interpret the input as a ZDType.
194+
195+
This function wraps ``parse_dtype``. The only difference is the function name. This function may
196+
be deprecated in a future version of Zarr Python in favor of ``parse_dtype``.
192197
193198
Parameters
194199
----------
195200
dtype_spec : ZDTypeLike
196-
The input to be interpreted as a ZDType instance. This could be a native data type
197-
(e.g., a NumPy data type), a Python object that can be converted into a native data type,
198-
a ZDType instance (in which case the input is returned unchanged), or a JSON object
199-
representation of a data type.
201+
The input to be interpreted as a ZDType. This could be a ZDType, which will be returned
202+
directly, or a JSON representation of a ZDType, or a native dtype, or a python object that
203+
can be converted into a native dtype.
200204
zarr_format : ZarrFormat
201-
The zarr format version.
205+
The Zarr format version. This parameter is required because this function will attempt to
206+
parse the JSON representation of a data type, and the JSON representation of data types
207+
varies between Zarr 2 and Zarr 3.
202208
203209
Returns
204210
-------
205211
ZDType[TBaseDType, TBaseScalar]
206-
The ZDType instance corresponding to the input.
212+
The ZDType corresponding to the input.
207213
208214
Examples
209215
--------
@@ -216,15 +222,57 @@ def parse_data_type(
216222
>>> parse_data_type({"name": "numpy.datetime64", "configuration": {"unit": "s", "scale_factor": 10}}, zarr_format=3)
217223
DateTime64(endianness='little', scale_factor=10, unit='s')
218224
"""
225+
return parse_dtype(dtype_spec, zarr_format=zarr_format)
226+
227+
228+
def parse_dtype(
229+
dtype_spec: ZDTypeLike,
230+
*,
231+
zarr_format: ZarrFormat,
232+
) -> ZDType[TBaseDType, TBaseScalar]:
233+
"""
234+
Convert the input as a ZDType.
235+
236+
Parameters
237+
----------
238+
dtype_spec : ZDTypeLike
239+
The input to be converted to a ZDType. This could be a ZDType, which will be returned
240+
directly, or a JSON representation of a ZDType, or a numpy dtype, or a python object that
241+
can be converted into a native dtype.
242+
zarr_format : ZarrFormat
243+
The Zarr format version. This parameter is required because this function will attempt to
244+
parse the JSON representation of a data type, and the JSON representation of data types
245+
varies between Zarr 2 and Zarr 3.
246+
247+
Returns
248+
-------
249+
ZDType[TBaseDType, TBaseScalar]
250+
The ZDType corresponding to the input.
251+
252+
Examples
253+
--------
254+
>>> from zarr.dtype import parse_dtype
255+
>>> import numpy as np
256+
>>> parse_dtype("int32", zarr_format=2)
257+
Int32(endianness='little')
258+
>>> parse_dtype(np.dtype('S10'), zarr_format=2)
259+
NullTerminatedBytes(length=10)
260+
>>> parse_dtype({"name": "numpy.datetime64", "configuration": {"unit": "s", "scale_factor": 10}}, zarr_format=3)
261+
DateTime64(endianness='little', scale_factor=10, unit='s')
262+
"""
219263
if isinstance(dtype_spec, ZDType):
220264
return dtype_spec
221-
# dict and zarr_format 3 means that we have a JSON object representation of the dtype
222-
if zarr_format == 3 and isinstance(dtype_spec, Mapping):
223-
return get_data_type_from_json(dtype_spec, zarr_format=3)
265+
# First attempt to interpret the input as JSON
266+
if isinstance(dtype_spec, Mapping | str | Sequence):
267+
try:
268+
return get_data_type_from_json(dtype_spec, zarr_format=zarr_format) # type: ignore[arg-type]
269+
except ValueError:
270+
# no data type matched this JSON-like input
271+
pass
224272
if dtype_spec in VLEN_UTF8_ALIAS:
225273
# If the dtype request is one of the aliases for variable-length UTF-8 strings,
226274
# return that dtype.
227275
return VariableLengthUTF8() # type: ignore[return-value]
228276
# otherwise, we have either a numpy dtype string, or a zarr v3 dtype string, and in either case
229-
# we can create a numpy dtype from it, and do the dtype inference from that
277+
# we can create a native dtype from it, and do the dtype inference from that
230278
return get_data_type_from_native_dtype(dtype_spec) # type: ignore[arg-type]

src/zarr/core/group.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -336,7 +336,7 @@ def to_buffer_dict(self, prototype: BufferPrototype) -> dict[str, Buffer]:
336336
if self.zarr_format == 3:
337337
return {
338338
ZARR_JSON: prototype.buffer.from_bytes(
339-
json.dumps(self.to_dict(), indent=json_indent, allow_nan=False).encode()
339+
json.dumps(self.to_dict(), indent=json_indent, allow_nan=True).encode()
340340
)
341341
}
342342
else:
@@ -345,7 +345,7 @@ def to_buffer_dict(self, prototype: BufferPrototype) -> dict[str, Buffer]:
345345
json.dumps({"zarr_format": self.zarr_format}, indent=json_indent).encode()
346346
),
347347
ZATTRS_JSON: prototype.buffer.from_bytes(
348-
json.dumps(self.attributes, indent=json_indent, allow_nan=False).encode()
348+
json.dumps(self.attributes, indent=json_indent, allow_nan=True).encode()
349349
),
350350
}
351351
if self.consolidated_metadata:
@@ -373,7 +373,7 @@ def to_buffer_dict(self, prototype: BufferPrototype) -> dict[str, Buffer]:
373373

374374
items[ZMETADATA_V2_JSON] = prototype.buffer.from_bytes(
375375
json.dumps(
376-
{"metadata": d, "zarr_consolidated_format": 1}, allow_nan=False
376+
{"metadata": d, "zarr_consolidated_format": 1}, allow_nan=True
377377
).encode()
378378
)
379379

src/zarr/core/metadata/v2.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -132,10 +132,10 @@ def to_buffer_dict(self, prototype: BufferPrototype) -> dict[str, Buffer]:
132132
json_indent = config.get("json_indent")
133133
return {
134134
ZARRAY_JSON: prototype.buffer.from_bytes(
135-
json.dumps(zarray_dict, indent=json_indent, allow_nan=False).encode()
135+
json.dumps(zarray_dict, indent=json_indent, allow_nan=True).encode()
136136
),
137137
ZATTRS_JSON: prototype.buffer.from_bytes(
138-
json.dumps(zattrs_dict, indent=json_indent, allow_nan=False).encode()
138+
json.dumps(zattrs_dict, indent=json_indent, allow_nan=True).encode()
139139
),
140140
}
141141

src/zarr/core/metadata/v3.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -288,7 +288,7 @@ def to_buffer_dict(self, prototype: BufferPrototype) -> dict[str, Buffer]:
288288
d = self.to_dict()
289289
return {
290290
ZARR_JSON: prototype.buffer.from_bytes(
291-
json.dumps(d, allow_nan=False, indent=json_indent).encode()
291+
json.dumps(d, allow_nan=True, indent=json_indent).encode()
292292
)
293293
}
294294

0 commit comments

Comments
 (0)