-
Notifications
You must be signed in to change notification settings - Fork 10
Rectilinear (variable-length) chunk grid #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
d-v-b
wants to merge
4
commits into
zarr-developers:main
Choose a base branch
from
d-v-b:feat/rectilinear-chunk-grid
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 1 commit
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# Rectilinear chunk grid | ||
|
||
## Metadata | ||
|
||
| field | type | required | | ||
| - | - | - | | ||
| `"name"` | Literal `"rectilinear"` | yes | | ||
| `"configuration"` | [#configuration][] | yes | | ||
|
||
### Configuration | ||
|
||
| field | type | required | notes | | ||
| - | - | - | - | | ||
| `"kind"` | Literal `"inline"` | yes | | | ||
| `"chunk_shapes"` | array[[Chunk edge lengths](#chunk-edge-lengths)] | | ||
|
||
#### Chunk edge lengths | ||
|
||
The edge lengths of the chunks along an array axis `A` are represented by an array that can contain two types of elements: | ||
- an integer that explicitly denotes denotes an edge length | ||
|
||
- an array that denotes a [run-length encoded](#run-length-encoding) sequence of integers, each of which denotes an edge length | ||
|
||
The sum of the edge lengths MUST match the length of the array along the axis `A`. | ||
|
||
#### Run-length encoding | ||
|
||
This specificiation defines a JSON representation for run-length encoded sequences. | ||
|
||
A run-length encoded sequence of `N` repetitions of some value `T` is denoted by the length-2 JSON array `[T, N]`. | ||
|
||
For example, the sequence `[1, 1, 1, 1, 1]` becomes `[1, 5]` after applying this run-length encoding. | ||
|
||
## Examples | ||
|
||
This example demonstrates 5 different ways of specifying a rectilinear chunk grid for an array with shape `(6, 6, 6, 6, 6)`. | ||
|
||
```javascript | ||
{ | ||
... | ||
"shape": [6, 6, 6, 6, 6], | ||
"chunk_grid": { | ||
"name": "rectilinear", | ||
"configuration": { | ||
"kind": "inline", | ||
"chunk_shapes": [ | ||
[[2, 3]], // expands to [2, 2, 2] | ||
[[1, 6]], // expands to [1, 1, 1, 1, 1, 1] | ||
[1, [2, 1], 3], // expands to [1, 2, 3] | ||
[[1, 3], 3], // expands to [1, 1, 1, 3] | ||
[6], // expands to [6] | ||
|
||
] | ||
} | ||
} | ||
} | ||
|
||
``` |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A previous proposal for rectilinear chunking also allowed a plain integer here, to indicate uniform chunking along the dimension. That makes this strictly a generalization of
regular
chunking and seems like a good idea to include.While the run-length encoding makes regular chunking along a dimension still efficient to specify, a plain integer indicating uniform chunking has the advantage of allowing the dimension to be resized without also specifying the new chunk sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially you could allow the last item in the
chunk_shapes
entry for a given dimension to be[n, -1]
to mean that all remaining chunks are sizen
--- that would allow the dimension to be resized since it would indicate the chunk size for new chunks.That would reduce the need for also allowing a plain integer to specify uniform chunking, but it is still probably good to allow a plain integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we removed the plain integer form because it was ambiguous -- for a shape
(10,)
, does{"chunk_shapes": [3]}
expand to[3, 3, 3, 1]
or[3, 3, 3, 3]
? I think we can restore the single integer form if we can resolve this ambiguity.For drop-in compatibility with the regular chunk grid, we do need to support some way of expressing chunks that overhang the grid of the array. But we might consider supporting this compatibility via something more explicit than the single-integer form.
And for resizing, I agree that it's convenient if resizing doesn't require re-defining the chunk grid, but for many resize operations with rectilinear chunks the appended data will have different chunking, and so any resize API for rectilinear chunks will need to support declaring new chunk shapes. So perhaps a convenient method for resizing with an automatic default chunk shape could be offered by an application, even if the
chunk_grid
is modified in any case.A related question is whether we support declaring chunk shapes that substantially overflow the shape of the array, e.g. for a shape
(10,)
,[3, 3, 3, 3, 3]
which has a length of 15. Without a clear use case, I'm inclined against supporting this here. Curious to hear what you think.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some feedback from folks at the Zarr summit was that we could allow declaring a sequence of chunks that exceeds the shape of the array, but disallow a sequence that would underfill the array. That seems reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see where the 1 would come from. I imagined it would expand to [3, 3, 3, 3]. The point is that it would work the same as the regular grid.
Yes, definitely should support overhanging the array bounds, for compatibility with regular grid, to achieve any necessary chunk size constraints, and to create room for resizing without specifying new chunk sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 1 would come from trimming the final chunk to match the array's shape exactly. I will bring back the single-integer form, and explain this behavior, which will remove the ambiguity.