Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changes/3534.feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Adds support for `RectilinearChunkGrid`, enabling arrays with variable chunk sizes along each dimension in Zarr v3. Users can now specify irregular chunking patterns using nested sequences: `chunks=[[10, 20, 30], [25, 25, 25, 25]]` creates an array with 3 chunks of sizes 10, 20, and 30 along the first dimension, and 4 chunks of size 25 along the second dimension. This feature is useful for data with non-uniform structure or when aligning chunks with existing data partitions. Note that `RectilinearChunkGrid` is only supported in Zarr format 3 and cannot be used with sharding or when creating arrays from existing data via `from_array()`.
118 changes: 118 additions & 0 deletions docs/user-guide/arrays.md
Original file line number Diff line number Diff line change
Expand Up @@ -566,6 +566,124 @@ In this example a shard shape of (1000, 1000) and a chunk shape of (100, 100) is
This means that `10*10` chunks are stored in each shard, and there are `10*10` shards in total.
Without the `shards` argument, there would be 10,000 chunks stored as individual files.

## Variable Chunking (Zarr v3)

In addition to regular chunking where all chunks have the same size, Zarr v3 supports
**variable chunking** (also called rectilinear chunking), where chunks can have different
sizes along each dimension. This is useful when your data has non-uniform structure or
when you need to align chunks with existing data partitions.

Comment on lines +574 to +575
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
when you need to align chunks with existing data partitions.
when you need to align chunks with existing data partitions.
The specification for this chunking scheme can be found [here](https://github.com/zarr-developers/zarr-extensions/tree/main/chunk-grids/rectilinear/).

This link doesn't resolve yet but it will when the spec is merged.

### Basic usage

To create an array with variable chunking, provide a nested sequence to the `chunks`
parameter instead of a regular tuple:

```python exec="true" session="arrays" source="above" result="ansi"
# Create an array with variable chunk sizes
z = zarr.create_array(
store='data/example-21.zarr',
shape=(60, 100),
chunks=[[10, 20, 30], [25, 25, 25, 25]], # Variable chunks
dtype='float32',
zarr_format=3
)
print(z)
print(f"Chunk grid type: {type(z.metadata.chunk_grid).__name__}")
```

In this example, the first dimension is divided into 3 chunks with sizes 10, 20, and 30
(totaling 60), and the second dimension is divided into 4 chunks of size 25 (totaling 100).

### Reading and writing

Arrays with variable chunking support the same read/write operations as regular arrays:

```python exec="true" session="arrays" source="above" result="ansi"
# Write data
data = np.arange(60 * 100, dtype='float32').reshape(60, 100)
z[:] = data

# Read data back
result = z[:]
print(f"Data matches: {np.all(result == data)}")
print(f"Slice [10:30, 50:75]: {z[10:30, 50:75].shape}")
```

### Accessing chunk information

With variable chunking, the standard `.chunks` property is not available since chunks
have different sizes. Instead, access chunk information through the chunk grid:
Comment on lines +614 to +615
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better if .chunks just had a different type (tuple of tuples of ints)


```python exec="true" session="arrays" source="above" result="ansi"
from zarr.core.chunk_grids import RectilinearChunkGrid

# Access the chunk grid
chunk_grid = z.metadata.chunk_grid
print(f"Chunk grid type: {type(chunk_grid).__name__}")

# Get chunk shapes for each dimension
if isinstance(chunk_grid, RectilinearChunkGrid):
print(f"Dimension 0 chunk sizes: {chunk_grid.chunk_shapes[0]}")
print(f"Dimension 1 chunk sizes: {chunk_grid.chunk_shapes[1]}")
print(f"Total number of chunks: {chunk_grid.get_nchunks((60, 100))}")
```

### Use cases

Variable chunking is particularly useful for:

1. **Irregular time series**: When your data has non-uniform time intervals, you can
create chunks that align with your sampling periods.

2. **Aligning with partitions**: When you need to match chunk boundaries with existing
data partitions or structural boundaries in your data.

3. **Optimizing access patterns**: When certain regions of your array are accessed more
frequently, you can use smaller chunks there for finer-grained access.

### Example: Time series with irregular intervals

```python exec="true" session="arrays" source="above" result="ansi"
# Daily measurements for one year, chunked by month
# Each chunk corresponds to one month (varying from 28-31 days)
z_timeseries = zarr.create_array(
store='data/example-22.zarr',
shape=(365, 100), # 365 days, 100 measurements per day
chunks=[[31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31], [100]], # Days per month
dtype='float64',
zarr_format=3
)
print(f"Created array with shape {z_timeseries.shape}")
print(f"Chunk shapes: {z_timeseries.metadata.chunk_grid.chunk_shapes}")
print(f"Number of chunks: {len(z_timeseries.metadata.chunk_grid.chunk_shapes[0])} months")
```

### Limitations

Variable chunking has some important limitations:

1. **Zarr v3 only**: This feature is only available when using `zarr_format=3`.
Attempting to use variable chunks with `zarr_format=2` will raise an error.

2. **Not compatible with sharding**: You cannot use variable chunking together with
the sharding feature. Arrays must use either variable chunking or sharding, but not both.

3. **Not compatible with `from_array()`**: Variable chunking cannot be used when creating
arrays from existing data using [`zarr.from_array`][]. This is because the function needs
to partition the input data, which requires regular chunk sizes.

4. **No `.chunks` property**: For arrays with variable chunking, accessing the `.chunks`
property will raise a `NotImplementedError`. Use `.metadata.chunk_grid.chunk_shapes`
instead.

```python exec="true" session="arrays" source="above" result="ansi"
# This will raise an error
try:
_ = z.chunks
except NotImplementedError as e:
print(f"Error: {e}")
```

## Missing features in 3.0

The following features have not been ported to 3.0 yet.
Expand Down
4 changes: 3 additions & 1 deletion docs/user-guide/extending.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,4 +85,6 @@ classes by implementing the interface defined in [`zarr.abc.buffer.BufferPrototy

## Other extensions

In the future, Zarr will support writing custom custom data types and chunk grids.
Zarr now includes built-in support for `RectilinearChunkGrid` (variable chunking), which allows arrays to have different chunk sizes along each dimension. See the [Variable Chunking](arrays.md#variable-chunking-zarr-v3) section in the Arrays guide for more information.

In the future, Zarr will support writing fully custom chunk grids and custom data types.
20 changes: 15 additions & 5 deletions src/zarr/api/synchronous.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
from zarr.errors import ZarrDeprecationWarning

if TYPE_CHECKING:
from collections.abc import Iterable
from collections.abc import Iterable, Sequence

import numpy as np
import numpy.typing as npt
Expand All @@ -29,6 +29,7 @@
)
from zarr.core.array_spec import ArrayConfigLike
from zarr.core.buffer import NDArrayLike, NDArrayLikeOrScalar
from zarr.core.chunk_grids import ChunkGrid
from zarr.core.chunk_key_encodings import ChunkKeyEncoding, ChunkKeyEncodingLike
from zarr.core.common import (
JSON,
Expand Down Expand Up @@ -821,7 +822,7 @@ def create_array(
shape: ShapeLike | None = None,
dtype: ZDTypeLike | None = None,
data: np.ndarray[Any, np.dtype[Any]] | None = None,
chunks: tuple[int, ...] | Literal["auto"] = "auto",
chunks: tuple[int, ...] | Sequence[Sequence[int]] | ChunkGrid | Literal["auto"] = "auto",
shards: ShardsLike | None = None,
filters: FiltersLike = "auto",
compressors: CompressorsLike = "auto",
Expand Down Expand Up @@ -857,9 +858,14 @@ def create_array(
data : np.ndarray, optional
Array-like data to use for initializing the array. If this parameter is provided, the
``shape`` and ``dtype`` parameters must be ``None``.
chunks : tuple[int, ...] | Literal["auto"], default="auto"
Chunk shape of the array.
If chunks is "auto", a chunk shape is guessed based on the shape of the array and the dtype.
chunks : tuple[int, ...] | Sequence[Sequence[int]] | ChunkGrid | Literal["auto"], default="auto"
Chunk shape of the array. Several formats are supported:

- tuple of ints: Creates a RegularChunkGrid with uniform chunks, e.g., ``(10, 10)``
- nested sequence: Creates a RectilinearChunkGrid with variable-sized chunks (Zarr format 3 only),
e.g., ``[[10, 20, 30], [5, 5]]`` creates variable chunks along each dimension
- ChunkGrid instance: Uses the provided chunk grid directly (Zarr format 3 only)
- "auto": Automatically determines chunk shape based on array shape and dtype
shards : tuple[int, ...], optional
Shard shape of the array. The default value of ``None`` results in no sharding at all.
filters : Iterable[Codec] | Literal["auto"], optional
Expand Down Expand Up @@ -1033,6 +1039,10 @@ def from_array(
- tuple[int, ...]: A tuple of integers representing the chunk shape.

If not specified, defaults to "keep" if data is a zarr Array, otherwise "auto".

.. note::
Variable chunking (RectilinearChunkGrid) is not supported when creating arrays from
existing data. Use regular chunking (uniform chunk sizes) instead.
shards : tuple[int, ...], optional
Shard shape of the array.
Following values are supported:
Expand Down
Loading