🐛 Fix Multi-GPU Support with torch.compile#923
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #923 +/- ##
========================================
Coverage 99.69% 99.69%
========================================
Files 71 71
Lines 8939 8947 +8
Branches 1170 1170
========================================
+ Hits 8912 8920 +8
Misses 23 23
Partials 4 4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
for more information, see https://pre-commit.ci
torch.compile
…nalytics/tiatoolbox into models-abc-multigpu
…nalytics/tiatoolbox into models-abc-multigpu
…nalytics/tiatoolbox into models-abc-multigpu
…/tiatoolbox into models-abc-multigpu
There was a problem hiding this comment.
Pull Request Overview
This PR adapts multi-GPU support to work with torch.compile by introducing a single‐process DDP fallback and updating cleanup, while retaining a DataParallel fallback for non-compile scenarios.
- Switches from
DataParallelto a single‐processDistributedDataParallel(DDP) when multiple GPUs are available and compilation is enabled - Initializes and later destroys the DDP process group in the engine
- Adds a local (skipped-on-CI) test for multi-GPU feature extraction
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| tiatoolbox/models/models_abc.py | Added DDP initialization & wrapping logic in model_to and retained DataParallel fallback |
| tiatoolbox/models/engine/semantic_segmentor.py | Imports and destroys the DDP process group after inference when compile-compatible |
| tests/models/test_feature_extractor.py | Introduced a skipped local test for multi-GPU feature extraction |
Comments suppressed due to low confidence (1)
tests/models/test_feature_extractor.py:128
- This test uses
shutilandPathbut there are no corresponding imports. Addimport shutilandfrom pathlib import Pathat the top of the file.
shutil.rmtree(save_dir, ignore_errors=True)
| and torch.cuda.device_count() > 1 | ||
| and is_torch_compile_compatible() | ||
| ): # pragma: no cover | ||
| dist.destroy_process_group() |
There was a problem hiding this comment.
Destroying the process group without verifying initialization may error if no group exists. Add if dist.is_initialized(): before calling destroy_process_group().
| dist.destroy_process_group() | |
| if dist.is_initialized(): | |
| dist.destroy_process_group() |
Change multi-GPU mode from
DataParalleltoDataDistributedParallelto work with torch.compile. However, this essentially limits the task to using one GPU alone when using torch.compile. It is not a trivial solution to change this to use multiple GPUs that also work with torch.compile. We will release a future fix to fully correct this with the new engine.