-
-
Notifications
You must be signed in to change notification settings - Fork 366
Feature: support rectilinear chunk grid extension #3534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
jhamman
wants to merge
9
commits into
zarr-developers:main
Choose a base branch
from
jhamman:feature/rectilinear-chunk-grid
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
0fd633f
feature(chunkgrids): add rectillinear chunk grid metadata support
jhamman 31c831d
implementation (wip)
jhamman d40dd30
fixup
jhamman 5f84198
fixup types
jhamman b96a650
Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…
jhamman 6deb98e
more tests
jhamman fc07af1
Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…
jhamman 527f7cf
docs
jhamman cea5fc2
fixups
jhamman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Adds support for `RectilinearChunkGrid`, enabling arrays with variable chunk sizes along each dimension in Zarr v3. Users can now specify irregular chunking patterns using nested sequences: `chunks=[[10, 20, 30], [25, 25, 25, 25]]` creates an array with 3 chunks of sizes 10, 20, and 30 along the first dimension, and 4 chunks of size 25 along the second dimension. This feature is useful for data with non-uniform structure or when aligning chunks with existing data partitions. Note that `RectilinearChunkGrid` is only supported in Zarr format 3 and cannot be used with sharding or when creating arrays from existing data via `from_array()`. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -566,6 +566,124 @@ In this example a shard shape of (1000, 1000) and a chunk shape of (100, 100) is | |
| This means that `10*10` chunks are stored in each shard, and there are `10*10` shards in total. | ||
| Without the `shards` argument, there would be 10,000 chunks stored as individual files. | ||
|
|
||
| ## Variable Chunking (Zarr v3) | ||
|
|
||
| In addition to regular chunking where all chunks have the same size, Zarr v3 supports | ||
| **variable chunking** (also called rectilinear chunking), where chunks can have different | ||
| sizes along each dimension. This is useful when your data has non-uniform structure or | ||
| when you need to align chunks with existing data partitions. | ||
|
|
||
| ### Basic usage | ||
|
|
||
| To create an array with variable chunking, provide a nested sequence to the `chunks` | ||
| parameter instead of a regular tuple: | ||
|
|
||
| ```python exec="true" session="arrays" source="above" result="ansi" | ||
| # Create an array with variable chunk sizes | ||
| z = zarr.create_array( | ||
| store='data/example-21.zarr', | ||
| shape=(60, 100), | ||
| chunks=[[10, 20, 30], [25, 25, 25, 25]], # Variable chunks | ||
| dtype='float32', | ||
| zarr_format=3 | ||
| ) | ||
| print(z) | ||
| print(f"Chunk grid type: {type(z.metadata.chunk_grid).__name__}") | ||
| ``` | ||
|
|
||
| In this example, the first dimension is divided into 3 chunks with sizes 10, 20, and 30 | ||
| (totaling 60), and the second dimension is divided into 4 chunks of size 25 (totaling 100). | ||
|
|
||
| ### Reading and writing | ||
|
|
||
| Arrays with variable chunking support the same read/write operations as regular arrays: | ||
|
|
||
| ```python exec="true" session="arrays" source="above" result="ansi" | ||
| # Write data | ||
| data = np.arange(60 * 100, dtype='float32').reshape(60, 100) | ||
| z[:] = data | ||
|
|
||
| # Read data back | ||
| result = z[:] | ||
| print(f"Data matches: {np.all(result == data)}") | ||
| print(f"Slice [10:30, 50:75]: {z[10:30, 50:75].shape}") | ||
| ``` | ||
|
|
||
| ### Accessing chunk information | ||
|
|
||
| With variable chunking, the standard `.chunks` property is not available since chunks | ||
| have different sizes. Instead, access chunk information through the chunk grid: | ||
|
Comment on lines
+614
to
+615
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be better if |
||
|
|
||
| ```python exec="true" session="arrays" source="above" result="ansi" | ||
| from zarr.core.chunk_grids import RectilinearChunkGrid | ||
|
|
||
| # Access the chunk grid | ||
| chunk_grid = z.metadata.chunk_grid | ||
| print(f"Chunk grid type: {type(chunk_grid).__name__}") | ||
|
|
||
| # Get chunk shapes for each dimension | ||
| if isinstance(chunk_grid, RectilinearChunkGrid): | ||
| print(f"Dimension 0 chunk sizes: {chunk_grid.chunk_shapes[0]}") | ||
| print(f"Dimension 1 chunk sizes: {chunk_grid.chunk_shapes[1]}") | ||
| print(f"Total number of chunks: {chunk_grid.get_nchunks((60, 100))}") | ||
| ``` | ||
|
|
||
| ### Use cases | ||
|
|
||
| Variable chunking is particularly useful for: | ||
|
|
||
| 1. **Irregular time series**: When your data has non-uniform time intervals, you can | ||
| create chunks that align with your sampling periods. | ||
|
|
||
| 2. **Aligning with partitions**: When you need to match chunk boundaries with existing | ||
| data partitions or structural boundaries in your data. | ||
|
|
||
| 3. **Optimizing access patterns**: When certain regions of your array are accessed more | ||
| frequently, you can use smaller chunks there for finer-grained access. | ||
|
|
||
| ### Example: Time series with irregular intervals | ||
|
|
||
| ```python exec="true" session="arrays" source="above" result="ansi" | ||
| # Daily measurements for one year, chunked by month | ||
| # Each chunk corresponds to one month (varying from 28-31 days) | ||
| z_timeseries = zarr.create_array( | ||
| store='data/example-22.zarr', | ||
| shape=(365, 100), # 365 days, 100 measurements per day | ||
| chunks=[[31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31], [100]], # Days per month | ||
| dtype='float64', | ||
| zarr_format=3 | ||
| ) | ||
| print(f"Created array with shape {z_timeseries.shape}") | ||
| print(f"Chunk shapes: {z_timeseries.metadata.chunk_grid.chunk_shapes}") | ||
| print(f"Number of chunks: {len(z_timeseries.metadata.chunk_grid.chunk_shapes[0])} months") | ||
| ``` | ||
|
|
||
| ### Limitations | ||
|
|
||
| Variable chunking has some important limitations: | ||
|
|
||
| 1. **Zarr v3 only**: This feature is only available when using `zarr_format=3`. | ||
| Attempting to use variable chunks with `zarr_format=2` will raise an error. | ||
|
|
||
| 2. **Not compatible with sharding**: You cannot use variable chunking together with | ||
| the sharding feature. Arrays must use either variable chunking or sharding, but not both. | ||
|
|
||
| 3. **Not compatible with `from_array()`**: Variable chunking cannot be used when creating | ||
| arrays from existing data using [`zarr.from_array`][]. This is because the function needs | ||
| to partition the input data, which requires regular chunk sizes. | ||
|
|
||
| 4. **No `.chunks` property**: For arrays with variable chunking, accessing the `.chunks` | ||
| property will raise a `NotImplementedError`. Use `.metadata.chunk_grid.chunk_shapes` | ||
| instead. | ||
|
|
||
| ```python exec="true" session="arrays" source="above" result="ansi" | ||
| # This will raise an error | ||
| try: | ||
| _ = z.chunks | ||
| except NotImplementedError as e: | ||
| print(f"Error: {e}") | ||
| ``` | ||
|
|
||
| ## Missing features in 3.0 | ||
|
|
||
| The following features have not been ported to 3.0 yet. | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link doesn't resolve yet but it will when the spec is merged.