fix(py): fix large pyramidal TIFF conversion performance and crashes #342
+737
−81
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixes conversion of large pyramidal TIFF files (e.g., JPEG2000 OME-TIFF) that previously caused excessive memory usage, task graph explosion, and hour-long processing times.
Resolves #310
Problem
Converting a 3GB JPEG2000 OME-TIFF (
602a12_z_stack.qupath.j2k.ome.tif) with 512×512 tiles was:AttributeError: 'Group' object has no attribute 'ndim'— pyramidal TIFFs return azarr.Group, not anArrayChanges
1. Handle zarr Groups from pyramidal TIFFs (
to_ngff_image.py)_extract_array_from_group()to extract the full-resolution array from azarr.Group, usingmultiscalesmetadata when available or falling back to the largest arrayto_ngff_image()now transparently handleszarr.Groupinputs2. Reduce dask task explosion (
to_multiscales.py)task_count()guard that was commented out_find_optimal_chunk_size()to module level for reuse_cache_2d_strips()and_cache_1d_segments()for strip-based caching of 2D/1D large images (previously a TODO)3. Reuse existing pyramid levels (
cli.py)_multiscales_from_tifffile_pyramid(): when a TIFF already contains multiple resolution levels, buildsMultiscalesdirectly from them instead of regenerating viato_multiscales()_next_scale_metadata()_apply_cli_metadata_overrides()out of_ngff_image_to_multiscales()for reuse by both code pathsto_multiscales()pathPerformance impact
For the 3GB test TIFF (shape: 3×57128×153122×3, 4 pyramid levels, 512×512 JPEG2000 tiles):
Testing
test_large_image_chunking.pycovering input-aligned chunks, channel preservation, 2D strip caching, and 1D segment caching