Skip to content

Chunking requirements are too restrictive for optimal performance #82

@emmanuelmathot

Description

@emmanuelmathot

Summary

The current specification requires strict 1:1 mapping between Zarr chunks and tile matrix tiles (Section 9.7.4), which may prevent optimal chunking strategies for different data types and storage backends.

Current Problem

  • Section 9.7.4 mandates "Chunks MUST match the tileWidth and tileHeight declared in the TileMatrix"
  • This prevents optimization for different access patterns
  • May not be optimal for cloud storage or specific analysis workflows
  • Doesn't account for data-dependent optimal chunk sizes

Proposed Solution

  1. Relax strict tile alignment requirements while maintaining compatibility
  2. Allow flexible chunking that optimizes for the specific use case
  3. Provide guidance on when stronger alignment is beneficial vs. when flexibility is preferred
  4. Document chunk alignment strategies for different scenarios

Implementation Evidence

The EOPF implementation includes chunk alignment logic:

def calculate_aligned_chunk_size(dimension_size: int, target_chunk_size: int) -> int:
    """Calculate a chunk size that divides evenly into the dimension size."""
    if target_chunk_size >= dimension_size:
        return dimension_size
    
    # Find the largest divisor that is <= target_chunk_size
    for chunk_size in range(target_chunk_size, 0, -1):
        if dimension_size % chunk_size == 0:
            return chunk_size
    return 1

This approach:

  • Prevents chunk overlap issues with Dask
  • Optimizes for data dimensions rather than arbitrary tile sizes
  • Maintains compatibility while improving performance

Specification Sections to Update

  • Section 9.7.4 (Chunk Layout Alignment)
  • Add performance considerations section
  • Include guidance on chunk size optimization

cc @vincentsarago, @maxrjones, @d-v-b, @briannapagan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions