[Breaking] Add multiscale data support#55
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR introduces breaking changes to add multiscale data support to ndevio, enabling napari to display image pyramids with multiple resolution levels. The core change transforms how layer data is structured: instead of returning single arrays, the API now returns lists of arrays where each element represents a different resolution level (from highest to lowest).
Changes:
- Introduced
reference_xarrayproperty to hold metadata-reference xarray while separating it from the multiscalelayer_data - Refactored
layer_dataproperty to return a list of arrays (one per resolution level) for napari multiscale support - Updated all tests to handle the new list-based data format
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ndevio/nimage.py | Core implementation: added reference_xarray property, refactored layer_data to return lists, added _build_layer_data method to construct multiscale arrays |
| src/ndevio/utils/_layer_utils.py | Updated type annotations in build_layer_tuple to accept both single arrays and lists of arrays |
| src/ndevio/widgets/_scene_widget.py | Updated cache clearing to invalidate both _reference_xarray and _layer_data when switching scenes |
| tests/test_nimage.py | Updated tests to use reference_xarray for metadata checks and verify layer_data returns lists |
| tests/test_napari_reader.py | Updated reader tests to handle list-based layer data format |
| tests/test_sampledata/test_sampledata.py | Updated validation to expect list-based data format for sample data |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| else: | ||
| xr_data = self.xarray_data | ||
|
|
||
| indexer = {d: 0 for d in xr_data.dims if d not in keep_dims} |
There was a problem hiding this comment.
The indexer assumes that dimensions not in keep_dims are singleton dimensions. If a lower resolution level has a non-singleton dimension that the reference level doesn't have, selecting index 0 will silently drop data. Consider adding validation to ensure dimensions being indexed with 0 are actually singleton (size == 1), or add a comment clarifying this assumption about the data structure.
| indexer = {d: 0 for d in xr_data.dims if d not in keep_dims} | |
| # Any dimension present in this level but not in the reference | |
| # xarray is assumed to be singleton and is indexed at 0 to keep | |
| # all levels with consistent ndim. Validate this assumption so | |
| # that non-singleton extra dimensions don't get silently dropped. | |
| non_ref_dims = [d for d in xr_data.dims if d not in keep_dims] | |
| for dim in non_ref_dims: | |
| size = xr_data.sizes.get(dim) | |
| if size is not None and size != 1: | |
| raise ValueError( | |
| f"Non-singleton dimension {dim!r} (size {size}) is " | |
| f"present in resolution level {level} but not in " | |
| f"reference dims {tuple(ref.dims)}; indexing with 0 " | |
| "would silently drop data." | |
| ) | |
| indexer = {d: 0 for d in non_ref_dims} |
| # Access layer_data property to trigger loading | ||
| data = img.layer_data | ||
| data = img.reference_xarray | ||
| # layer_data will be squeezed |
There was a problem hiding this comment.
This comment is now slightly misleading. The test is accessing reference_xarray (not layer_data), so the comment should be updated to say "reference_xarray will be squeezed" for clarity. While reference_xarray is indeed squeezed, the comment as written suggests the test is about layer_data.
| # layer_data will be squeezed | |
| # reference_xarray will be squeezed |
| @@ -50,7 +50,7 @@ def test_nImage_remote_zarr(): | |||
| assert img.path == remote_zarr | |||
| assert img._is_remote | |||
| # original shape is (1, 2, 1, 512, 512) but layer_data is squeezed | |||
There was a problem hiding this comment.
This comment mentions "layer_data is squeezed" but the test is accessing reference_xarray. Update the comment to say "reference_xarray is squeezed" for accuracy.
| # original shape is (1, 2, 1, 512, 512) but layer_data is squeezed | |
| # original shape is (1, 2, 1, 512, 512) but reference_xarray is squeezed |
| # Data should be a list of arrays (multiscale-ready) | ||
| assert isinstance(data, list) | ||
| assert len(data) == 1 # single resolution level | ||
| # Shape should NOT include channel dimension | ||
| assert data[0].shape == (60, 66, 85) # ZYX only |
There was a problem hiding this comment.
While the test verifies single-resolution images return a list with one element, there's no test coverage for actual multiscale images with multiple resolution levels. Consider adding a test that loads a multiscale image (e.g., a zarr with multiple pyramid levels) to verify that the list contains multiple arrays with decreasing resolutions and that dimensions are handled correctly across levels.
| return self._reference_xarray | ||
|
|
||
| @property | ||
| def layer_data(self) -> list: |
There was a problem hiding this comment.
The return type annotation should be more specific. Consider changing list to list[ArrayLike] to match the usage in build_layer_tuple and provide better type safety. This is consistent with the type annotation pattern used elsewhere in the codebase (e.g., build_layer_tuple parameter at line 116 in _layer_utils.py).
| _is_remote: bool | ||
| _layer_data: xr.DataArray | None | ||
| _reference_xarray: xr.DataArray | None | ||
| _layer_data: list | None |
There was a problem hiding this comment.
The type annotation should be more specific. Consider changing list | None to list[ArrayLike] | None to match the return type of _build_layer_data and provide better type safety.
| self._layer_data = self._build_layer_data() | ||
| return self._layer_data | ||
|
|
||
| def _build_layer_data(self) -> list: |
There was a problem hiding this comment.
The return type annotation should be more specific. Consider changing list to list[ArrayLike] to match the docstring and provide better type safety.
| for level in levels: | ||
| self.set_resolution_level(level) | ||
| if ( | ||
| multiscale | ||
| or self._is_remote | ||
| or not determine_in_memory(self.path) | ||
| ): | ||
| xr_data = self.xarray_dask_data | ||
| else: | ||
| xr_data = self.xarray_data | ||
|
|
||
| indexer = {d: 0 for d in xr_data.dims if d not in keep_dims} | ||
| arrays.append(xr_data.isel(indexer).data) | ||
|
|
||
| self.set_resolution_level(current_res) | ||
| return arrays | ||
|
|
There was a problem hiding this comment.
The resolution level restoration at line 264 won't execute if an exception occurs within the loop. Consider using a try-finally block to ensure the resolution level is always restored, even if an error occurs during data loading or indexing operations.
| for level in levels: | |
| self.set_resolution_level(level) | |
| if ( | |
| multiscale | |
| or self._is_remote | |
| or not determine_in_memory(self.path) | |
| ): | |
| xr_data = self.xarray_dask_data | |
| else: | |
| xr_data = self.xarray_data | |
| indexer = {d: 0 for d in xr_data.dims if d not in keep_dims} | |
| arrays.append(xr_data.isel(indexer).data) | |
| self.set_resolution_level(current_res) | |
| return arrays | |
| try: | |
| for level in levels: | |
| self.set_resolution_level(level) | |
| if ( | |
| multiscale | |
| or self._is_remote | |
| or not determine_in_memory(self.path) | |
| ): | |
| xr_data = self.xarray_dask_data | |
| else: | |
| xr_data = self.xarray_data | |
| indexer = {d: 0 for d in xr_data.dims if d not in keep_dims} | |
| arrays.append(xr_data.isel(indexer).data) | |
| return arrays | |
| finally: | |
| self.set_resolution_level(current_res) |
| if self._reference_xarray is None: | ||
| # Ensure we're at the highest-res level for metadata consistency | ||
| current_res = self.current_resolution_level | ||
| self.set_resolution_level(0) | ||
| if self._is_remote or not determine_in_memory(self.path): | ||
| self._layer_data = self.xarray_dask_data.squeeze() | ||
| self._reference_xarray = self.xarray_dask_data.squeeze() | ||
| else: | ||
| self._layer_data = self.xarray_data.squeeze() | ||
| self._reference_xarray = self.xarray_data.squeeze() | ||
| self.set_resolution_level(current_res) | ||
| return self._reference_xarray |
There was a problem hiding this comment.
The resolution level restoration at line 211 won't execute if an exception occurs during data loading. Consider using a try-finally block to ensure the resolution level is always restored to its original value, even if xarray_dask_data or xarray_data access fails.
Description
In looking more closely at #50 and zarr in general, I realized that enabling remote reading is only giving the highest level of data, YIKES! I've never really had to work with multiscale data before, so this was a big oversight. Fortunately, napari handles multiscale nicely, where if a list is passed it will determine whether it is a single element (not multiscale) or decreasing elements (multiscale). This means I can restructure to always pass a list to napari for the data.
This also made me realize a weakness of before, where
nImage.layer_datawas really the highest resolution xarray. This was also used to determine other properties like scale and such (good) but mean that we would have to takenImage.layer_data.datafor the actual data.Instead, this introduces a breaking API change where this xarray now lives at
nImage.reference_xarrayand nownImage.layer_datais alist[ArrayLike]so that its drop in for napari.