Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
c1d1550
add int2 example, and expand dtype docs
d-v-b Jun 19, 2025
6e4a938
specify zarr with a direct local file reference for the dtype example
d-v-b Jun 19, 2025
8d18eed
add comment on pep-723 metadata
d-v-b Jun 19, 2025
bfb2088
ignore future warning in docs
d-v-b Jun 19, 2025
60a1e30
Merge branch 'main' into docs/dtype-docs
d-v-b Jun 19, 2025
5b2a601
Merge branch 'main' into docs/dtype-docs
d-v-b Jun 20, 2025
893540f
re-export vlen-bytes
d-v-b Jun 22, 2025
15ebfa6
make examples stand-alone and testable via script dependency modifica…
d-v-b Jun 22, 2025
383acfc
docstrings
d-v-b Jun 23, 2025
46e80ec
oMerge branch 'docs/dtype-docs' of github.com:d-v-b/zarr-python into …
d-v-b Jun 23, 2025
2e96eca
changelog
d-v-b Jun 23, 2025
b9a510a
docstring style
d-v-b Jun 24, 2025
e34d18e
add docstrings and polish interfaces
d-v-b Jun 24, 2025
c4031dc
fixup
d-v-b Jun 24, 2025
ae268b9
prose
d-v-b Jun 24, 2025
eec3ec3
gamble on a new pytest version fixing windows CI failure
d-v-b Jun 24, 2025
f942508
gamble on a new pytest version fixing windows CI failure
d-v-b Jun 24, 2025
9d3dc48
revert change to pytest dep
d-v-b Jun 24, 2025
532ae1e
skip example tests on windows
d-v-b Jun 24, 2025
620749b
unexclude api from exclude_patterns
d-v-b Jun 24, 2025
45aab29
harmonize docstrings
d-v-b Jun 24, 2025
bf15d71
numpy -> np
d-v-b Jun 24, 2025
27cccdd
restructure list of dtypes
d-v-b Jun 24, 2025
a68751a
code block
d-v-b Jun 24, 2025
f3c44db
prose
d-v-b Jun 24, 2025
3045e9a
revert ectopic change
d-v-b Jun 24, 2025
84b572e
remove trailing underscore from np.void
d-v-b Jun 24, 2025
f35d4c1
remove methods section, correct attributes
d-v-b Jun 24, 2025
669afd3
resolve docs build error my re-ordering plugins. great stuff, sphinx
d-v-b Jun 24, 2025
9af24ed
numpy -> np
d-v-b Jun 24, 2025
3453af2
fix doctests
d-v-b Jun 24, 2025
efb767f
add pytest to docs env, because this resolves a warning about a missi…
d-v-b Jun 24, 2025
6e6d337
put return types in double backticks
d-v-b Jun 24, 2025
cee23aa
escape piped return types
d-v-b Jun 24, 2025
3172095
fix internal link
d-v-b Jun 24, 2025
f6d67b3
Merge branch 'main' into docs/dtype-docs
d-v-b Jun 24, 2025
0c20603
Merge branch 'main' into docs/dtype-docs
d-v-b Jun 24, 2025
432d975
Merge branch 'main' of github.com:zarr-developers/zarr-python into do…
d-v-b Jun 25, 2025
b4a05ba
Merge branch 'docs/dtype-docs' of github.com:d-v-b/zarr-python into d…
d-v-b Jun 25, 2025
bd7e9fc
Merge branch 'main' into docs/dtype-docs
d-v-b Jun 25, 2025
a785e35
Update examples/custom_dtype.py
d-v-b Jun 27, 2025
bec9512
Merge branch 'main' of github.com:zarr-developers/zarr-python into do…
d-v-b Jul 1, 2025
d408b9d
make datatype configuration typeddict readonly
d-v-b Jul 1, 2025
b4a114d
document namedconfig
d-v-b Jul 1, 2025
0639696
document and export typeddicts, move dtype docs to an advanced section
d-v-b Jul 2, 2025
f608c12
remove added features from list of missing features
d-v-b Jul 2, 2025
80bd097
fix accidental copy + paste breakage
d-v-b Jul 2, 2025
c44f1a2
use anonymous rst links
d-v-b Jul 2, 2025
2540eab
Merge branch 'main' into docs/dtype-docs
d-v-b Jul 3, 2025
368145f
normalize typerror when check_scalar fails, and add tests for it
d-v-b Jul 3, 2025
87c71fa
prose
d-v-b Jul 3, 2025
3cfaa0d
Merge branch 'main' of github.com:zarr-developers/zarr-python into do…
d-v-b Jul 3, 2025
48000fc
improve coverage the hard way and the easy way
d-v-b Jul 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions changes/3157.doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Add a self-contained example of data type extension to the ``examples`` directory, and expanded
the documentation for data types.
4 changes: 2 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,14 @@
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"sphinx.ext.viewcode",
"sphinx.ext.intersphinx",
'autoapi.extension',
"numpydoc",
"sphinx_issues",
"sphinx_copybutton",
"sphinx_design",
'sphinx_reredirects',
"sphinx.ext.viewcode",
]

issues_github_path = "zarr-developers/zarr-python"
Expand All @@ -56,7 +56,7 @@
autoapi_member_order = "groupwise"
autoapi_root = "api"
autoapi_keep_files = True
autoapi_options = [ 'members', 'undoc-members', 'show-inheritance', 'show-module-summary', 'imported-members', ]
autoapi_options = [ 'members', 'undoc-members', 'show-inheritance', 'show-module-summary', 'imported-members', 'inherited-members']

def skip_submodules(
app: sphinx.application.Sphinx,
Expand Down
26 changes: 2 additions & 24 deletions docs/user-guide/arrays.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ The code above creates a 2-dimensional array of 32-bit integers with 10000 rows
and 10000 columns, divided into chunks where each chunk has 1000 rows and 1000
columns (and so there will be 100 chunks in total). The data is written to a
:class:`zarr.storage.MemoryStore` (e.g. an in-memory dict). See
:ref:`user-guide-persist` for details on storing arrays in other stores.
:ref:`user-guide-persist` for details on storing arrays in other stores, and see
:ref:`user-guide-data-types` for an in-depth look at the data types supported by Zarr.

For a complete list of array creation routines see the :mod:`zarr`
module documentation.
Expand Down Expand Up @@ -629,29 +630,6 @@ Missing features in 3.0

The following features have not been ported to 3.0 yet.

.. _user-guide-objects:

Object arrays
~~~~~~~~~~~~~

See the Zarr-Python 2 documentation on `Object arrays <https://zarr.readthedocs.io/en/support-v2/tutorial.html#object-arrays>`_ for more details.

.. _user-guide-strings:

Fixed-length string arrays
~~~~~~~~~~~~~~~~~~~~~~~~~~

See the Zarr-Python 2 documentation on `Fixed-length string arrays <https://zarr.readthedocs.io/en/support-v2/tutorial.html#string-arrays>`_ for more details.

.. _user-guide-datetime:

Datetime and Timedelta arrays
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

See the Zarr-Python 2 documentation on `Datetime and Timedelta <https://zarr.readthedocs.io/en/support-v2/tutorial.html#datetimes-and-timedeltas>`_ for more details.

.. _user-guide-copy:

Copying and migrating data
~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
372 changes: 306 additions & 66 deletions docs/user-guide/data_types.rst

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/user-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ User guide

installation
arrays
data_types
groups
attributes
storage
Expand All @@ -21,6 +20,7 @@ Advanced Topics
.. toctree::
:maxdepth: 1

data_types
performance
consolidated_metadata
extending
Expand Down
245 changes: 245 additions & 0 deletions examples/custom_dtype.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "zarr @ git+https://github.com/zarr-developers/zarr-python.git@main",
# "ml_dtypes==0.5.1",
# "pytest==8.4.1"
# ]
# ///
#

"""
Demonstrate how to extend Zarr Python by defining a new data type
"""

import json
import sys
from pathlib import Path
from typing import ClassVar, Literal, Self, TypeGuard, overload

import ml_dtypes # necessary to add extra dtypes to NumPy
import numpy as np
import pytest

import zarr
from zarr.core.common import JSON, ZarrFormat
from zarr.core.dtype import ZDType, data_type_registry
from zarr.core.dtype.common import (
DataTypeValidationError,
DTypeConfig_V2,
DTypeJSON,
check_dtype_spec_v2,
)

# This is the int2 array data type
int2_dtype_cls = type(np.dtype("int2"))

# This is the int2 scalar type
int2_scalar_cls = ml_dtypes.int2


class Int2(ZDType[int2_dtype_cls, int2_scalar_cls]):
"""
This class provides a Zarr compatibility layer around the int2 data type (the ``dtype`` of a
NumPy array of type int2) and the int2 scalar type (the ``dtype`` of the scalar value inside an int2 array).
"""

# This field is as the key for the data type in the internal data type registry, and also
# as the identifier for the data type when serializaing the data type to disk for zarr v3
_zarr_v3_name: ClassVar[Literal["int2"]] = "int2"
# this field will be used internally
_zarr_v2_name: ClassVar[Literal["int2"]] = "int2"

# we bind a class variable to the native data type class so we can create instances of it
dtype_cls = int2_dtype_cls

@classmethod
def from_native_dtype(cls, dtype: np.dtype) -> Self:
"""Create an instance of this ZDType from a native dtype."""
if cls._check_native_dtype(dtype):
return cls()
raise DataTypeValidationError(
f"Invalid data type: {dtype}. Expected an instance of {cls.dtype_cls}"
)

def to_native_dtype(self: Self) -> int2_dtype_cls:
"""Create an int2 dtype instance from this ZDType"""
return self.dtype_cls()

@classmethod
def _check_json_v2(cls, data: DTypeJSON) -> TypeGuard[DTypeConfig_V2[Literal["|b1"], None]]:
"""
Type check for Zarr v2-flavored JSON.

This will check that the input is a dict like this:
.. code-block:: json

{
"name": "int2",
"object_codec_id": None
}

Note that this representation differs from the ``dtype`` field looks like in zarr v2 metadata.
Specifically, whatever goes into the ``dtype`` field in metadata is assigned to the ``name`` field here.

See the Zarr docs for more information about the JSON encoding for data types.
"""
return (
check_dtype_spec_v2(data) and data["name"] == "int2" and data["object_codec_id"] is None
)

@classmethod
def _check_json_v3(cls, data: DTypeJSON) -> TypeGuard[Literal["int2"]]:
"""
Type check for Zarr V3-flavored JSON.

Checks that the input is the string "int2".
"""
return data == cls._zarr_v3_name

@classmethod
def _from_json_v2(cls, data: DTypeJSON) -> Self:
"""
Create an instance of this ZDType from Zarr V3-flavored JSON.
"""
if cls._check_json_v2(data):
return cls()
# This first does a type check on the input, and if that passes we create an instance of the ZDType.
msg = f"Invalid JSON representation of {cls.__name__}. Got {data!r}, expected the string {cls._zarr_v2_name!r}"
raise DataTypeValidationError(msg)

@classmethod
def _from_json_v3(cls: type[Self], data: DTypeJSON) -> Self:
"""
Create an instance of this ZDType from Zarr V3-flavored JSON.

This first does a type check on the input, and if that passes we create an instance of the ZDType.
"""
if cls._check_json_v3(data):
return cls()
msg = f"Invalid JSON representation of {cls.__name__}. Got {data!r}, expected the string {cls._zarr_v3_name!r}"
raise DataTypeValidationError(msg)

@overload # type: ignore[override]
def to_json(self, zarr_format: Literal[2]) -> DTypeConfig_V2[Literal["int2"], None]: ...

@overload
def to_json(self, zarr_format: Literal[3]) -> Literal["int2"]: ...

def to_json(
self, zarr_format: ZarrFormat
) -> DTypeConfig_V2[Literal["int2"], None] | Literal["int2"]:
"""
Serialize this ZDType to v2- or v3-flavored JSON

If the zarr_format is 2, then return a dict like this:
.. code-block:: json

{
"name": "int2",
"object_codec_id": None
}

If the zarr_format is 3, then return the string "int2"

"""
if zarr_format == 2:
return {"name": "int2", "object_codec_id": None}
elif zarr_format == 3:
return self._zarr_v3_name
raise ValueError(f"zarr_format must be 2 or 3, got {zarr_format}") # pragma: no cover

def _check_scalar(self, data: object) -> TypeGuard[int | ml_dtypes.int2]:
"""
Check if a python object is a valid int2-compatible scalar

The strictness of this type check is an implementation degree of freedom.
You could be strict here, and only accept int2 values, or be open and accept any integer
or any object and rely on exceptions from the int2 constructor that will be called in
cast_scalar.
"""
return isinstance(data, (int, int2_scalar_cls))

def cast_scalar(self, data: object) -> ml_dtypes.int2:
"""
Attempt to cast a python object to an int2.

We first perform a type check to ensure that the input type is appropriate, and if that
passes we call the int2 scalar constructor.
"""
if self._check_scalar(data):
return ml_dtypes.int2(data)
msg = (
f"Cannot convert object {data!r} with type {type(data)} to a scalar compatible with the "
f"data type {self}."
)
raise TypeError(msg)

def default_scalar(self) -> ml_dtypes.int2:
"""
Get the default scalar value. This will be used when automatically selecting a fill value.
"""
return ml_dtypes.int2(0)

def to_json_scalar(self, data: object, *, zarr_format: ZarrFormat) -> int:
"""
Convert a python object to a JSON representation of an int2 scalar.
This is necessary for taking user input for the ``fill_value`` attribute in array metadata.

In this implementation, we optimistically convert the input to an int,
and then check that it lies in the acceptable range for this data type.
"""
# We could add a type check here, but we don't need to for this example
val: int = int(data) # type: ignore[call-overload]
if val not in (-2, -1, 0, 1):
raise ValueError("Invalid value. Expected -2, -1, 0, or 1.")
return val

def from_json_scalar(self, data: JSON, *, zarr_format: ZarrFormat) -> ml_dtypes.int2:
"""
Read a JSON-serializable value as an int2 scalar.

We first perform a type check to ensure that the JSON value is well-formed, then call the
int2 scalar constructor.

The base definition of this method requires that it take a zarr_format parameter because
other data types serialize scalars differently in zarr v2 and v3, but we don't use this here.

"""
if self._check_scalar(data):
return ml_dtypes.int2(data)
raise TypeError(f"Invalid type: {data}. Expected an int.")


# after defining dtype class, it must be registered with the data type registry so zarr can use it
data_type_registry.register(Int2._zarr_v3_name, Int2)


# this parametrized function will create arrays in zarr v2 and v3 using our new data type
@pytest.mark.parametrize("zarr_format", [2, 3])
def test_custom_dtype(tmp_path: Path, zarr_format: Literal[2, 3]) -> None:
# create array and write values
z_w = zarr.create_array(
store=tmp_path, shape=(4,), dtype="int2", zarr_format=zarr_format, compressors=None
)
z_w[:] = [-1, -2, 0, 1]

# open the array
z_r = zarr.open_array(tmp_path, mode="r")

print(z_r.info_complete())

# look at the array metadata
if zarr_format == 2:
meta_file = tmp_path / ".zarray"
else:
meta_file = tmp_path / "zarr.json"
print(json.dumps(json.loads(meta_file.read_text()), indent=2))


if __name__ == "__main__":
# Run the example with printed output, and a dummy pytest configuration file specified.
# Without the dummy configuration file, at test time pytest will attempt to use the
# configuration file in the project root, which will error because Zarr is using some
# plugins that are not installed in this example.
sys.exit(pytest.main(["-s", __file__, f"-c {__file__}"]))
6 changes: 5 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,9 @@ test = [
"mypy",
"hypothesis",
"pytest-xdist",
"packaging",
"tomlkit",
"uv"
]
remote_tests = [
'zarr[remote]',
Expand All @@ -106,7 +109,8 @@ docs = [
'numcodecs[msgpack]',
'rich',
's3fs>=2023.10.0',
'astroid<4'
'astroid<4',
'pytest'
]


Expand Down
17 changes: 15 additions & 2 deletions src/zarr/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
overload,
)

from typing_extensions import ReadOnly

from zarr.core.config import config as zarr_config

if TYPE_CHECKING:
Expand Down Expand Up @@ -48,8 +50,19 @@


class NamedConfig(TypedDict, Generic[TName, TConfig]):
name: TName
configuration: TConfig
"""
A typed dictionary representing an object with a name and configuration, where the configuration
is a mapping of string keys to values, e.g. another typed dictionary or a JSON object.

This class is generic with two type parameters: the type of the name (``TName``) and the type of
the configuration (``TConfig``).
"""

name: ReadOnly[TName]
"""The name of the object."""

configuration: ReadOnly[TConfig]
"""The configuration of the object."""


def product(tup: ChunkCoords) -> int:
Expand Down
Loading