Skip to content
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions chunk-grids/rectilinear/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Rectilinear chunk grid

## Metadata

| field | type | required |
| - | - | - |
| `"name"` | Literal `"rectilinear"` | yes |
| `"configuration"` | [#configuration][] | yes |

### Configuration

| field | type | required | notes |
| - | - | - | - |
| `"kind"` | Literal `"inline"` | yes | |
| `"chunk_shapes"` | array[[Chunk edge lengths](#chunk-edge-lengths)] |

#### Chunk edge lengths

The edge lengths of the chunks along an array axis `A` are represented by an array that can contain two types of elements:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A previous proposal for rectilinear chunking also allowed a plain integer here, to indicate uniform chunking along the dimension. That makes this strictly a generalization of regular chunking and seems like a good idea to include.

While the run-length encoding makes regular chunking along a dimension still efficient to specify, a plain integer indicating uniform chunking has the advantage of allowing the dimension to be resized without also specifying the new chunk sizes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially you could allow the last item in the chunk_shapes entry for a given dimension to be [n, -1] to mean that all remaining chunks are size n --- that would allow the dimension to be resized since it would indicate the chunk size for new chunks.

That would reduce the need for also allowing a plain integer to specify uniform chunking, but it is still probably good to allow a plain integer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we removed the plain integer form because it was ambiguous -- for a shape (10,), does {"chunk_shapes": [3]} expand to [3, 3, 3, 1] or [3, 3, 3, 3]? I think we can restore the single integer form if we can resolve this ambiguity.

For drop-in compatibility with the regular chunk grid, we do need to support some way of expressing chunks that overhang the grid of the array. But we might consider supporting this compatibility via something more explicit than the single-integer form.

And for resizing, I agree that it's convenient if resizing doesn't require re-defining the chunk grid, but for many resize operations with rectilinear chunks the appended data will have different chunking, and so any resize API for rectilinear chunks will need to support declaring new chunk shapes. So perhaps a convenient method for resizing with an automatic default chunk shape could be offered by an application, even if the chunk_grid is modified in any case.

A related question is whether we support declaring chunk shapes that substantially overflow the shape of the array, e.g. for a shape (10,), [3, 3, 3, 3, 3] which has a length of 15. Without a clear use case, I'm inclined against supporting this here. Curious to hear what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a clear use case, I'm inclined against supporting this here. Curious to hear what you think.

Some feedback from folks at the Zarr summit was that we could allow declaring a sequence of chunks that exceeds the shape of the array, but disallow a sequence that would underfill the array. That seems reasonable to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we removed the plain integer form because it was ambiguous -- for a shape (10,), does {"chunk_shapes": [3]} expand to [3, 3, 3, 1] or [3, 3, 3, 3]? I think we can restore the single integer form if we can resolve this ambiguity.

I don't see where the 1 would come from. I imagined it would expand to [3, 3, 3, 3]. The point is that it would work the same as the regular grid.

For drop-in compatibility with the regular chunk grid, we do need to support some way of expressing chunks that overhang the grid of the array. But we might consider supporting this compatibility via something more explicit than the single-integer form.

And for resizing, I agree that it's convenient if resizing doesn't require re-defining the chunk grid, but for many resize operations with rectilinear chunks the appended data will have different chunking, and so any resize API for rectilinear chunks will need to support declaring new chunk shapes. So perhaps a convenient method for resizing with an automatic default chunk shape could be offered by an application, even if the chunk_grid is modified in any case.

A related question is whether we support declaring chunk shapes that substantially overflow the shape of the array, e.g. for a shape (10,), [3, 3, 3, 3, 3] which has a length of 15. Without a clear use case, I'm inclined against supporting this here. Curious to hear what you think.

Yes, definitely should support overhanging the array bounds, for compatibility with regular grid, to achieve any necessary chunk size constraints, and to create room for resizing without specifying new chunk sizes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a clear use case, I'm inclined against supporting this here. Curious to hear what you think.

Some feedback from folks at the Zarr summit was that we could allow declaring a sequence of chunks that exceeds the shape of the array, but disallow a sequence that would underfill the array. That seems reasonable to me.

Agreed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where the 1 would come from. I imagined it would expand to [3, 3, 3, 3]. The point is that it would work the same as the regular grid.

The 1 would come from trimming the final chunk to match the array's shape exactly. I will bring back the single-integer form, and explain this behavior, which will remove the ambiguity.

- an integer that explicitly denotes denotes an edge length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra denotes

- an array that denotes a [run-length encoded](#run-length-encoding) sequence of integers, each of which denotes an edge length

The sum of the edge lengths MUST match the length of the array along the axis `A`.

#### Run-length encoding

This specificiation defines a JSON representation for run-length encoded sequences.

A run-length encoded sequence of `N` repetitions of some value `T` is denoted by the length-2 JSON array `[T, N]`.

For example, the sequence `[1, 1, 1, 1, 1]` becomes `[1, 5]` after applying this run-length encoding.

## Examples

This example demonstrates 5 different ways of specifying a rectilinear chunk grid for an array with shape `(6, 6, 6, 6, 6)`.

```javascript
{
...
"shape": [6, 6, 6, 6, 6],
"chunk_grid": {
"name": "rectilinear",
"configuration": {
"kind": "inline",
"chunk_shapes": [
[[2, 3]], // expands to [2, 2, 2]
[[1, 6]], // expands to [1, 1, 1, 1, 1, 1]
[1, [2, 1], 3], // expands to [1, 2, 3]
[[1, 3], 3], // expands to [1, 1, 1, 3]
[6], // expands to [6]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trailing comma

]
}
}
}

```