-
Notifications
You must be signed in to change notification settings - Fork 102
⚡ Make WSIPatchDataset Pickleable to Support Windows Multithreading #947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR makes the WSIPatchDataset class picklable by delaying the creation of the reader object until the first call to __getitem__. This enables the use of multiple loader workers on Windows without errors and provides significant performance improvements.
- Delays reader object instantiation to the first
__getitem__call instead of during initialization - Extracts reader creation logic into a separate
_get_readermethod - Stores image path and mode as instance variables for lazy initialization
| # may decouple into misc ? | ||
| # the scaling factor will scale base level to requested read resolution/units | ||
| wsi_shape = self.reader.slide_dimensions(resolution=resolution, units=units) | ||
| wsi_shape = reader.slide_dimensions(resolution=resolution, units=units) |
Copilot
AI
Aug 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a temporary reader object during initialization defeats the purpose of lazy initialization. This reader is only used to get slide dimensions but will be discarded, causing unnecessary overhead. Consider caching the slide dimensions or refactoring to avoid creating the reader twice.
| """Get an item from the dataset.""" | ||
| coords = self.inputs[idx] | ||
| # Read image patch from the whole-slide image | ||
| if self.reader is None: |
Copilot
AI
Aug 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lazy initialization of self.reader is not thread-safe. Multiple threads could simultaneously check if self.reader is None and create multiple reader instances, potentially causing race conditions in a multi-threaded environment.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #947 +/- ##
========================================
Coverage 99.70% 99.70%
========================================
Files 71 71
Lines 9133 9141 +8
Branches 1188 1190 +2
========================================
+ Hits 9106 9114 +8
Misses 23 23
Partials 4 4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
shaneahmed
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @measty This looks good.
* ⚡ Make WSIPatchDataset Pickleable to Support Windows Multithreading (#947) This PR makes the WSIPatchDataset class picklable by delaying the creation of the reader object until the first call to `__getitem__`. This enables the use of multiple loader workers on Windows without errors and provides significant performance improvements. - Delays reader object instantiation to the first `__getitem__` call instead of during initialization - Extracts reader creation logic into a separate `_get_reader` method - Stores image path and mode as instance variables for lazy initialization Speedup for the WSI prediction cell of the patch_prediction example notebook: 2min 48 sec with 0 loader workers -> 1min 13 sec with 4 workers. Note: this PR doesn't have any effect for Linux as the multi-threading already works fine there because Linux multithreading doesn't require things to be pickleable * 🔀 Merge branch develop into dev-engine-abc * 🐛 Fix reader_info read --------- Co-authored-by: Mark Eastwood <[email protected]>
This PR makes the WSIPatchDataset class picklable by delaying the creation of the reader object until the first call to
__getitem__. This enables the use of multiple loader workers on Windows without errors and provides significant performance improvements.__getitem__call instead of during initialization_get_readermethodSpeedup for the WSI prediction cell of the patch_prediction example notebook:
2min 48 sec with 0 loader workers -> 1min 13 sec with 4 workers.
Note: this PR doesn't have any effect for Linux as the multi-threading already works fine there because Linux multithreading doesn't require things to be pickleable