Skip to content

Making explicit chunk cache handling policies and enabling a custom autochunker with a fall back to h5py's build-in autochunker#741

Open
mkuehbach wants to merge 15 commits intomasterfrom
modifiable_hfive_chunking
Open

Making explicit chunk cache handling policies and enabling a custom autochunker with a fall back to h5py's build-in autochunker#741
mkuehbach wants to merge 15 commits intomasterfrom
modifiable_hfive_chunking

Conversation

@mkuehbach
Copy link
Collaborator

@mkuehbach mkuehbach commented Feb 12, 2026

Motivation:

  • Customized chunking for compressed storage at the dataset level to allow tailoring the chunk layout better when slicing directions have a priori known usage biases towards a specific direction, like in EM, APM and other high volume techniques
  • Making explicit the fact that internally chunking can be modified using buffer settings that take different effect on different hardware, like sequential or true parallel file systems. Currently configured with the default values that were hidden in the internals of the h5py library.

…nly those modifications pertaining to customized chunking related work, and adding the customize auto chunker code snippet from pynxtools-em and pynxtools-apm here, next step copy over the documentation from the refactoring_compression feature branch and add to that existent documentation details about what this feature branch adds as additional functionalities for customizing the chunking
@mkuehbach mkuehbach changed the title Making explicit chunk cache handling policies and enabling a custom autochunker with a fall back to h5py original autochunker Making explicit chunk cache handling policies and enabling a custom autochunker with a fall back to h5py's build-in autochunker Feb 12, 2026
Copy link
Collaborator

@RubelMozumder RubelMozumder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tests for it. Multiple tests would be helpful to get the error before any unwanted breaks in production.

@mkuehbach
Copy link
Collaborator Author

There are no tests for it. Multiple tests would be helpful to get the error before any unwanted breaks in production.

I could add a test_chunk.py probing the return values of some instances of custom auto chunker

@mkuehbach
Copy link
Collaborator Author

There are no tests for it. Multiple tests would be helpful to get the error before any unwanted breaks in production.

I could add a test_chunk.py probing the return values of some instances of custom auto chunker

Tests added

Copy link
Collaborator

@lukaspie lukaspie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good, just one additional comment. Thanks!

@@ -0,0 +1,57 @@
#
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three new files should be combined into one (e.g. chunk.py) since they all address the same functionality.

Alternatively, as discussed in #739, they should go to helpers.py.

) -> tuple[int, ...] | bool:
"""Define an explicit tuple[int] how to chunk data with shape

Parameter:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very large docstring, maybe we can have a link in the docs to this file to explain why/how the heuristic is different here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants