forked from christophenoel/geozarr-spec
-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
Summary
The current specification requires strict 1:1 mapping between Zarr chunks and tile matrix tiles (Section 9.7.4), which may prevent optimal chunking strategies for different data types and storage backends.
Current Problem
- Section 9.7.4 mandates "Chunks MUST match the tileWidth and tileHeight declared in the TileMatrix"
- This prevents optimization for different access patterns
- May not be optimal for cloud storage or specific analysis workflows
- Doesn't account for data-dependent optimal chunk sizes
Proposed Solution
- Relax strict tile alignment requirements while maintaining compatibility
- Allow flexible chunking that optimizes for the specific use case
- Provide guidance on when stronger alignment is beneficial vs. when flexibility is preferred
- Document chunk alignment strategies for different scenarios
Implementation Evidence
The EOPF implementation includes chunk alignment logic:
def calculate_aligned_chunk_size(dimension_size: int, target_chunk_size: int) -> int:
"""Calculate a chunk size that divides evenly into the dimension size."""
if target_chunk_size >= dimension_size:
return dimension_size
# Find the largest divisor that is <= target_chunk_size
for chunk_size in range(target_chunk_size, 0, -1):
if dimension_size % chunk_size == 0:
return chunk_size
return 1This approach:
- Prevents chunk overlap issues with Dask
- Optimizes for data dimensions rather than arbitrary tile sizes
- Maintains compatibility while improving performance
Specification Sections to Update
- Section 9.7.4 (Chunk Layout Alignment)
- Add performance considerations section
- Include guidance on chunk size optimization
cc @vincentsarago, @maxrjones, @d-v-b, @briannapagan
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Todo