Skip to content

Conversation

@measty
Copy link
Collaborator

@measty measty commented Aug 8, 2025

This PR makes the WSIPatchDataset class picklable by delaying the creation of the reader object until the first call to __getitem__. This enables the use of multiple loader workers on Windows without errors and provides significant performance improvements.

  • Delays reader object instantiation to the first __getitem__ call instead of during initialization
  • Extracts reader creation logic into a separate _get_reader method
  • Stores image path and mode as instance variables for lazy initialization

Speedup for the WSI prediction cell of the patch_prediction example notebook:
2min 48 sec with 0 loader workers -> 1min 13 sec with 4 workers.

Note: this PR doesn't have any effect for Linux as the multi-threading already works fine there because Linux multithreading doesn't require things to be pickleable

@Jiaqi-Lv Jiaqi-Lv requested a review from Copilot August 8, 2025 10:09
@shaneahmed shaneahmed changed the title make patchdataset picklable ⚡ Make WSIPatchDataset Picklable to Support Multithreading Aug 8, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR makes the WSIPatchDataset class picklable by delaying the creation of the reader object until the first call to __getitem__. This enables the use of multiple loader workers on Windows without errors and provides significant performance improvements.

  • Delays reader object instantiation to the first __getitem__ call instead of during initialization
  • Extracts reader creation logic into a separate _get_reader method
  • Stores image path and mode as instance variables for lazy initialization

# may decouple into misc ?
# the scaling factor will scale base level to requested read resolution/units
wsi_shape = self.reader.slide_dimensions(resolution=resolution, units=units)
wsi_shape = reader.slide_dimensions(resolution=resolution, units=units)
Copy link

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a temporary reader object during initialization defeats the purpose of lazy initialization. This reader is only used to get slide dimensions but will be discarded, causing unnecessary overhead. Consider caching the slide dimensions or refactoring to avoid creating the reader twice.

Copilot uses AI. Check for mistakes.
"""Get an item from the dataset."""
coords = self.inputs[idx]
# Read image patch from the whole-slide image
if self.reader is None:
Copy link

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lazy initialization of self.reader is not thread-safe. Multiple threads could simultaneously check if self.reader is None and create multiple reader instances, potentially causing race conditions in a multi-threaded environment.

Copilot uses AI. Check for mistakes.
@codecov
Copy link

codecov bot commented Aug 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.70%. Comparing base (c1eb36c) to head (fb3d63c).
⚠️ Report is 12 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #947   +/-   ##
========================================
  Coverage    99.70%   99.70%           
========================================
  Files           71       71           
  Lines         9133     9141    +8     
  Branches      1188     1190    +2     
========================================
+ Hits          9106     9114    +8     
  Misses          23       23           
  Partials         4        4           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@shaneahmed shaneahmed added this to the Release v1.7.0 milestone Aug 8, 2025
@shaneahmed shaneahmed added the enhancement New feature or request label Aug 8, 2025
@measty measty changed the title ⚡ Make WSIPatchDataset Picklable to Support Multithreading ⚡ Make WSIPatchDataset Picklable to Support Windows Multithreading Aug 8, 2025
@shaneahmed shaneahmed changed the title ⚡ Make WSIPatchDataset Picklable to Support Windows Multithreading ⚡ Make WSIPatchDataset Pickleable to Support Windows Multithreading Aug 8, 2025
Copy link
Member

@shaneahmed shaneahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @measty This looks good.

@shaneahmed shaneahmed merged commit d5c1995 into develop Aug 8, 2025
15 checks passed
@shaneahmed shaneahmed deleted the make-patchdataset-picklable branch August 8, 2025 16:33
shaneahmed added a commit that referenced this pull request Aug 11, 2025
* ⚡ Make WSIPatchDataset Pickleable to Support Windows Multithreading (#947)

This PR makes the WSIPatchDataset class picklable by delaying the creation of the reader object until the first call to `__getitem__`. This enables the use of multiple loader workers on Windows without errors and provides significant performance improvements.

- Delays reader object instantiation to the first `__getitem__` call instead of during initialization
- Extracts reader creation logic into a separate `_get_reader` method
- Stores image path and mode as instance variables for lazy initialization

Speedup for the WSI prediction cell of the patch_prediction example notebook: 
2min 48 sec with 0 loader workers -> 1min 13 sec with 4 workers.

Note: this PR doesn't have any effect for Linux as the multi-threading already works fine there because Linux multithreading doesn't require things to be pickleable

* 🔀 Merge branch develop into dev-engine-abc

* 🐛 Fix reader_info read

---------

Co-authored-by: Mark Eastwood <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants