Skip to content

Conversation

@shaneahmed
Copy link
Member

@shaneahmed shaneahmed commented Mar 31, 2023

  • Improve Engines performance and implementation
  • Redesigns PatchPredictor engine using the new EngineABC base class.
  • The WSIs are now processed using the same code as for the processing the patches using WSI based dataloader.
  • The intermediate output is saved as zarr for the WSIs to resolve memory issues.
  • The output of model architectures should now be a dictionary.
  • The output can be specified as AnnotationStore for visualisation using TIAViz.
  • Fix mypy Type Checks for cli/common.py
  • Redesigns PatchPredictor engine using the new EngineABC base class.
  • The WSIs are now processed using the same code as for the processing the patches using WSI based dataloader.
  • The intermediate output is saved as zarr for the WSIs to resolve memory issues.
  • The output of model architectures should now be a dictionary.
  • The output can be specified as AnnotationStore for visualisation using TIAViz.
  • Add PatchPredictor Engine based on EngineABC
  • Add return_probabilities option to Params
  • Removes merge_predictions option in PatchPredictor engine.
  • Defines post_process_cache_mode which allows running the algorithm on WSI
  • Add infer_wsi for WSI inference
  • Removes save_wsi_output as this is not required after post processing.
  • Removes merge_predictions and fixes docstring in EngineABCRunParams
  • compile_model is now moved to EngineABC init
  • Fixes bug with _calculate_scale_factor
  • Fixes a bug in class_dict definition.
  • _get_zarr_array is now a public function get_zarr_array in misc
  • patch_predictions_as_annotations runs the loop on patch_coords instead of class_probs

@shaneahmed shaneahmed self-assigned this Mar 31, 2023
@shaneahmed shaneahmed added the enhancement New feature or request label Mar 31, 2023
@codecov
Copy link

codecov bot commented Mar 31, 2023

Codecov Report

❌ Patch coverage is 89.67001% with 72 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.72%. Comparing base (adc18c9) to head (b542c9a).

Files with missing lines Patch % Lines
tiatoolbox/models/dataset/dataset_abc.py 73.97% 38 Missing ⚠️
tiatoolbox/models/engine/io_config.py 56.75% 32 Missing ⚠️
tiatoolbox/cli/nucleus_instance_segment.py 66.66% 1 Missing ⚠️
tiatoolbox/utils/misc.py 97.77% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #578      +/-   ##
===========================================
- Coverage    99.27%   94.72%   -4.56%     
===========================================
  Files           71       73       +2     
  Lines         9162     9235      +73     
  Branches      1195     1208      +13     
===========================================
- Hits          9096     8748     -348     
- Misses          40      452     +412     
- Partials        26       35       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Refactor engines_abc.py
@shaneahmed shaneahmed changed the title ⚡ Improve Engines Performance and Implementation ⚡ Improve Engine Performance and Implementation Apr 28, 2023
pre-commit-ci bot and others added 30 commits April 10, 2025 10:33
# Conflicts:
#	tiatoolbox/utils/misc.py
# Conflicts:
#	tests/models/test_feature_extractor.py
#	tiatoolbox/models/models_abc.py
# Conflicts:
#	tiatoolbox/cli/common.py
#	tiatoolbox/cli/nucleus_instance_segment.py
#	tiatoolbox/cli/patch_predictor.py
#	tiatoolbox/models/engine/semantic_segmentor.py
* ⚡ Make WSIPatchDataset Pickleable to Support Windows Multithreading (#947)

This PR makes the WSIPatchDataset class picklable by delaying the creation of the reader object until the first call to `__getitem__`. This enables the use of multiple loader workers on Windows without errors and provides significant performance improvements.

- Delays reader object instantiation to the first `__getitem__` call instead of during initialization
- Extracts reader creation logic into a separate `_get_reader` method
- Stores image path and mode as instance variables for lazy initialization

Speedup for the WSI prediction cell of the patch_prediction example notebook: 
2min 48 sec with 0 loader workers -> 1min 13 sec with 4 workers.

Note: this PR doesn't have any effect for Linux as the multi-threading already works fine there because Linux multithreading doesn't require things to be pickleable

* 🔀 Merge branch develop into dev-engine-abc

* 🐛 Fix reader_info read

---------

Co-authored-by: Mark Eastwood <[email protected]>
# Conflicts:
#	tiatoolbox/models/dataset/classification.py
# Conflicts:
#	tests/models/test_patch_predictor.py
# Conflicts:
#	tests/models/test_feature_extractor.py
#	tests/models/test_multi_task_segmentor.py
#	tests/models/test_nucleus_instance_segmentor.py
#	tests/models/test_patch_predictor.py
#	tests/models/test_semantic_segmentation.py
#	tiatoolbox/models/architecture/__init__.py
## Summary of Changes

### Major Additions
- **Dask Integration:**  
  - Added `dask` as a dependency and integrated Dask arrays and lazy computation throughout the engine and patch predictor code.
  - Added Dask-based merging, chunking, and memory-aware processing for large images and WSIs.

- **Zarr Output Support:**  
  - Added support for saving model predictions and intermediate results directly to Zarr format.
  - New CLI options and internal logic for Zarr output, including memory thresholding and chunked writes.

- **SemanticSegmentor Engine:**  
  - Added a new `SemanticSegmentor` engine with Dask/Zarr support and new test coverage (`test_semantic_segmentor.py`).
  - Added CLI entrypoint for `semantic_segmentor` and removed the old `semantic_segment` CLI.

- **Enhanced CLI and Config:**  
  - Added CLI options for memory threshold, unified worker options, and improved mask handling.
  - Updated YAML configs and sample data for new models and test images.

- **Utilities and Validation:**  
  - Added utility functions for minimal dtype casting, patch/stride validation, and improved error handling (e.g., `DimensionMismatchError`).
  - Improved annotation store conversion for Dask arrays and Zarr-backed outputs.

- **Changes to `kwarg`**
  - Add `memory-threshold`
  - Unified `num-loader-workers` and `num-postproc-workers` into `num-workers`
  - Removed `cache_mode` as cache mode is automatically handled.

---

### Major Removals/Refactors
- **Removed Old CLI and Redundant Code:**  
  - Deleted the old `semantic_segment.py` CLI and replaced it with `semantic_segmentor.py`.
  - Removed legacy cache mode and patch prediction Zarr store tests.

- **Refactored Model and Dataset APIs:**  
  - Unified and simplified model inference APIs to always return arrays (not dicts) for batch outputs.
  - Refactored dataset classes to enforce patch shape validation and remove legacy “mode” logic.

- **Test Cleanup:**  
  - Removed or updated tests that relied on old APIs or cache mode.
  - Refactored test assertions for new output types and Dask array handling.

- **API Consistency:**  
  - Standardized function and argument names across engines, CLI, and utility modules.
  - Updated docstrings and type hints for clarity and consistency.

---

### Notable File Changes
- **New:**  
  - `tiatoolbox/cli/semantic_segmentor.py`
  - `tests/engines/test_semantic_segmentor.py`

- **Removed:**  
  - `tiatoolbox/cli/semantic_segment.py`
  - Old cache mode and patch Zarr store tests

- **Heavily Modified:**  
  - `engine_abc.py`, `patch_predictor.py`, `semantic_segmentor.py`
  - CLI modules and test suites
  - Dataset and utility modules for Dask/Zarr compatibility

---

### Impact

- Enables scalable, parallel, and memory-efficient inference and output saving for large images.
- Simplifies downstream analysis by supporting Zarr as a native output format.
- Lays the groundwork for further Dask-based optimizations in TIAToolbox.


---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment