Skip to content

Commit 527f7cf

Browse files
committed
docs
1 parent fc07af1 commit 527f7cf

File tree

5 files changed

+289
-1
lines changed

5 files changed

+289
-1
lines changed

changes/3534.feature.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Adds support for `RectilinearChunkGrid`, enabling arrays with variable chunk sizes along each dimension in Zarr v3. Users can now specify irregular chunking patterns using nested sequences: `chunks=[[10, 20, 30], [25, 25, 25, 25]]` creates an array with 3 chunks of sizes 10, 20, and 30 along the first dimension, and 4 chunks of size 25 along the second dimension. This feature is useful for data with non-uniform structure or when aligning chunks with existing data partitions. Note that `RectilinearChunkGrid` is only supported in Zarr format 3 and cannot be used with sharding or when creating arrays from existing data via `from_array()`.

docs/user-guide/arrays.md

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -566,6 +566,124 @@ In this example a shard shape of (1000, 1000) and a chunk shape of (100, 100) is
566566
This means that `10*10` chunks are stored in each shard, and there are `10*10` shards in total.
567567
Without the `shards` argument, there would be 10,000 chunks stored as individual files.
568568

569+
## Variable Chunking (Zarr v3)
570+
571+
In addition to regular chunking where all chunks have the same size, Zarr v3 supports
572+
**variable chunking** (also called rectilinear chunking), where chunks can have different
573+
sizes along each dimension. This is useful when your data has non-uniform structure or
574+
when you need to align chunks with existing data partitions.
575+
576+
### Basic usage
577+
578+
To create an array with variable chunking, provide a nested sequence to the `chunks`
579+
parameter instead of a regular tuple:
580+
581+
```python exec="true" session="arrays" source="above" result="ansi"
582+
# Create an array with variable chunk sizes
583+
z = zarr.create_array(
584+
store='data/example-21.zarr',
585+
shape=(60, 100),
586+
chunks=[[10, 20, 30], [25, 25, 25, 25]], # Variable chunks
587+
dtype='float32',
588+
zarr_format=3
589+
)
590+
print(z)
591+
print(f"Chunk grid type: {type(z.metadata.chunk_grid).__name__}")
592+
```
593+
594+
In this example, the first dimension is divided into 3 chunks with sizes 10, 20, and 30
595+
(totaling 60), and the second dimension is divided into 4 chunks of size 25 (totaling 100).
596+
597+
### Reading and writing
598+
599+
Arrays with variable chunking support the same read/write operations as regular arrays:
600+
601+
```python exec="true" session="arrays" source="above" result="ansi"
602+
# Write data
603+
data = np.arange(60 * 100, dtype='float32').reshape(60, 100)
604+
z[:] = data
605+
606+
# Read data back
607+
result = z[:]
608+
print(f"Data matches: {np.all(result == data)}")
609+
print(f"Slice [10:30, 50:75]: {z[10:30, 50:75].shape}")
610+
```
611+
612+
### Accessing chunk information
613+
614+
With variable chunking, the standard `.chunks` property is not available since chunks
615+
have different sizes. Instead, access chunk information through the chunk grid:
616+
617+
```python exec="true" session="arrays" source="above" result="ansi"
618+
from zarr.core.chunk_grids import RectilinearChunkGrid
619+
620+
# Access the chunk grid
621+
chunk_grid = z.metadata.chunk_grid
622+
print(f"Chunk grid type: {type(chunk_grid).__name__}")
623+
624+
# Get chunk shapes for each dimension
625+
if isinstance(chunk_grid, RectilinearChunkGrid):
626+
print(f"Dimension 0 chunk sizes: {chunk_grid.chunk_shapes[0]}")
627+
print(f"Dimension 1 chunk sizes: {chunk_grid.chunk_shapes[1]}")
628+
print(f"Total number of chunks: {chunk_grid.get_nchunks((60, 100))}")
629+
```
630+
631+
### Use cases
632+
633+
Variable chunking is particularly useful for:
634+
635+
1. **Irregular time series**: When your data has non-uniform time intervals, you can
636+
create chunks that align with your sampling periods.
637+
638+
2. **Aligning with partitions**: When you need to match chunk boundaries with existing
639+
data partitions or structural boundaries in your data.
640+
641+
3. **Optimizing access patterns**: When certain regions of your array are accessed more
642+
frequently, you can use smaller chunks there for finer-grained access.
643+
644+
### Example: Time series with irregular intervals
645+
646+
```python exec="true" session="arrays" source="above" result="ansi"
647+
# Daily measurements for one year, chunked by month
648+
# Each chunk corresponds to one month (varying from 28-31 days)
649+
z_timeseries = zarr.create_array(
650+
store='data/example-22.zarr',
651+
shape=(365, 100), # 365 days, 100 measurements per day
652+
chunks=[[31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31], [100]], # Days per month
653+
dtype='float64',
654+
zarr_format=3
655+
)
656+
print(f"Created array with shape {z_timeseries.shape}")
657+
print(f"Chunk shapes: {z_timeseries.metadata.chunk_grid.chunk_shapes}")
658+
print(f"Number of chunks: {len(z_timeseries.metadata.chunk_grid.chunk_shapes[0])} months")
659+
```
660+
661+
### Limitations
662+
663+
Variable chunking has some important limitations:
664+
665+
1. **Zarr v3 only**: This feature is only available when using `zarr_format=3`.
666+
Attempting to use variable chunks with `zarr_format=2` will raise an error.
667+
668+
2. **Not compatible with sharding**: You cannot use variable chunking together with
669+
the sharding feature. Arrays must use either variable chunking or sharding, but not both.
670+
671+
3. **Not compatible with `from_array()`**: Variable chunking cannot be used when creating
672+
arrays from existing data using [`zarr.from_array`][]. This is because the function needs
673+
to partition the input data, which requires regular chunk sizes.
674+
675+
4. **No `.chunks` property**: For arrays with variable chunking, accessing the `.chunks`
676+
property will raise a `NotImplementedError`. Use `.metadata.chunk_grid.chunk_shapes`
677+
instead.
678+
679+
```python exec="true" session="arrays" source="above" result="ansi"
680+
# This will raise an error
681+
try:
682+
_ = z.chunks
683+
except NotImplementedError as e:
684+
print(f"Error: {e}")
685+
```
686+
569687
## Missing features in 3.0
570688

571689
The following features have not been ported to 3.0 yet.

docs/user-guide/extending.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,4 +85,6 @@ classes by implementing the interface defined in [`zarr.abc.buffer.BufferPrototy
8585

8686
## Other extensions
8787

88-
In the future, Zarr will support writing custom custom data types and chunk grids.
88+
Zarr now includes built-in support for `RectilinearChunkGrid` (variable chunking), which allows arrays to have different chunk sizes along each dimension. See the [Variable Chunking](arrays.md#variable-chunking-zarr-v3) section in the Arrays guide for more information.
89+
90+
In the future, Zarr will support writing fully custom chunk grids and custom data types.

src/zarr/core/array.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4338,6 +4338,10 @@ async def from_array(
43384338
- tuple[int, ...]: A tuple of integers representing the chunk shape.
43394339
43404340
If not specified, defaults to "keep" if data is a zarr Array, otherwise "auto".
4341+
4342+
.. note::
4343+
Variable chunking (RectilinearChunkGrid) is not supported when creating arrays from
4344+
existing data. Use regular chunking (uniform chunk sizes) instead.
43414345
shards : tuple[int, ...], optional
43424346
Shard shape of the array.
43434347
Following values are supported:
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
"""Integration tests for RectilinearChunkGrid with array creation."""
2+
3+
from typing import Literal
4+
5+
import numpy as np
6+
import pytest
7+
8+
import zarr
9+
from zarr.core.chunk_grids import RectilinearChunkGrid
10+
from zarr.storage import MemoryStore
11+
12+
13+
@pytest.mark.parametrize("zarr_format", [3])
14+
async def test_create_array_with_nested_chunks(zarr_format: Literal[2, 3]) -> None:
15+
"""
16+
Test creating an array with nested chunk specification (RectilinearChunkGrid).
17+
This is an end-to-end test for the feature.
18+
"""
19+
store = MemoryStore()
20+
arr = await zarr.api.asynchronous.create_array(
21+
store=store,
22+
shape=(60, 100),
23+
chunks=[[10, 20, 30], [25, 25, 25, 25]],
24+
dtype="i4",
25+
zarr_format=zarr_format,
26+
)
27+
28+
# Verify metadata has RectilinearChunkGrid
29+
assert isinstance(arr.metadata.chunk_grid, RectilinearChunkGrid)
30+
assert arr.metadata.chunk_grid.chunk_shapes == ((10, 20, 30), (25, 25, 25, 25))
31+
32+
# Verify array is functional - can write and read data
33+
data = np.arange(60 * 100, dtype="i4").reshape(60, 100)
34+
await arr.setitem(slice(None), data)
35+
result = await arr.getitem(slice(None))
36+
np.testing.assert_array_equal(result, data)
37+
38+
39+
async def test_create_array_nested_chunks_read_write() -> None:
40+
"""
41+
Test that arrays with RectilinearChunkGrid support standard read/write operations.
42+
"""
43+
store = MemoryStore()
44+
arr = await zarr.api.asynchronous.create_array(
45+
store=store,
46+
shape=(30, 40),
47+
chunks=[[10, 10, 10], [10, 10, 10, 10]],
48+
dtype="f4",
49+
zarr_format=3,
50+
)
51+
52+
# Write data to different chunks
53+
arr_data = np.random.random((30, 40)).astype("f4")
54+
await arr.setitem(slice(None), arr_data)
55+
56+
# Read full array
57+
result = await arr.getitem(slice(None))
58+
np.testing.assert_array_almost_equal(np.asarray(result), arr_data)
59+
60+
# Read partial slices
61+
partial = await arr.getitem((slice(5, 25), slice(10, 30)))
62+
np.testing.assert_array_almost_equal(np.asarray(partial), arr_data[5:25, 10:30])
63+
64+
65+
async def test_rectilinear_chunk_grid_roundtrip() -> None:
66+
"""
67+
Test that RectilinearChunkGrid persists correctly through save/load.
68+
"""
69+
store = MemoryStore()
70+
71+
# Create array with nested chunks
72+
arr1 = await zarr.api.asynchronous.create_array(
73+
store=store,
74+
name="test_array",
75+
shape=(60, 80),
76+
chunks=[[10, 20, 30], [20, 20, 20, 20]],
77+
dtype="u1",
78+
zarr_format=3,
79+
)
80+
81+
# Write some data
82+
data = np.arange(60 * 80, dtype="u1").reshape(60, 80)
83+
await arr1.setitem(slice(None), data)
84+
85+
# Re-open the array
86+
arr2 = await zarr.api.asynchronous.open_array(store=store, path="test_array")
87+
88+
# Verify chunk_grid is preserved
89+
assert isinstance(arr2.metadata.chunk_grid, RectilinearChunkGrid)
90+
assert arr2.metadata.chunk_grid.chunk_shapes == ((10, 20, 30), (20, 20, 20, 20))
91+
92+
# Verify data is preserved
93+
result = await arr2.getitem(slice(None))
94+
np.testing.assert_array_equal(result, data)
95+
96+
97+
async def test_from_array_rejects_nested_chunks() -> None:
98+
"""
99+
Test that from_array rejects nested chunks (RectilinearChunkGrid) with has_data=True.
100+
"""
101+
store = MemoryStore()
102+
data = np.arange(30 * 40, dtype="i4").reshape(30, 40)
103+
104+
# Should raise error because RectilinearChunkGrid is not compatible with has_data=True
105+
with pytest.raises(
106+
ValueError,
107+
match="Cannot use RectilinearChunkGrid.*when creating array from data",
108+
):
109+
await zarr.api.asynchronous.from_array(
110+
store=store,
111+
data=data,
112+
chunks=[[10, 10, 10], [10, 10, 10, 10]], # type: ignore[arg-type]
113+
zarr_format=3,
114+
)
115+
116+
117+
async def test_nested_chunks_with_different_sizes() -> None:
118+
"""
119+
Test RectilinearChunkGrid with highly irregular chunk sizes.
120+
"""
121+
store = MemoryStore()
122+
arr = await zarr.api.asynchronous.create_array(
123+
store=store,
124+
shape=(100, 100),
125+
chunks=[[5, 10, 15, 20, 50], [100]], # Very irregular first dim, uniform second
126+
dtype="i2",
127+
zarr_format=3,
128+
)
129+
130+
assert isinstance(arr.metadata.chunk_grid, RectilinearChunkGrid)
131+
assert arr.metadata.chunk_grid.chunk_shapes == ((5, 10, 15, 20, 50), (100,))
132+
133+
# Verify writes work correctly
134+
data = np.arange(100 * 100, dtype="i2").reshape(100, 100)
135+
await arr.setitem(slice(None), data)
136+
result = await arr.getitem(slice(None))
137+
np.testing.assert_array_equal(result, data)
138+
139+
140+
async def test_rectilinear_chunk_grid_nchunks_not_supported() -> None:
141+
"""
142+
Test that nchunks property raises NotImplementedError for RectilinearChunkGrid.
143+
144+
Note: The chunks property (and thus nchunks) is only defined for RegularChunkGrid.
145+
For RectilinearChunkGrid, use chunk_grid.get_nchunks() instead.
146+
"""
147+
store = MemoryStore()
148+
arr = await zarr.api.asynchronous.create_array(
149+
store=store,
150+
shape=(60, 100),
151+
chunks=[[10, 20, 30], [25, 25, 25, 25]],
152+
dtype="u1",
153+
zarr_format=3,
154+
)
155+
156+
# The chunks property is not defined for RectilinearChunkGrid
157+
with pytest.raises(
158+
NotImplementedError, match="only defined for arrays using.*RegularChunkGrid"
159+
):
160+
_ = arr.nchunks
161+
162+
# But we can get nchunks from the chunk_grid directly
163+
assert arr.metadata.chunk_grid.get_nchunks((60, 100)) == 12

0 commit comments

Comments
 (0)