[XPU][NPU][ROCM] enable cpu_offloading flag for non_cuda#1488
[XPU][NPU][ROCM] enable cpu_offloading flag for non_cuda#1488hsliuustc0106 merged 12 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d8ed71998f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
gcanlin
left a comment
There was a problem hiding this comment.
Thanks, I think it will also work on NPU. Let me test it quickly :)
| if not current_omni_platform.supports_cpu_offload() or current_omni_platform.get_device_count() < 1: | ||
| logger.warning( | ||
| "Current device: %s does not support CPU offloading. Skipping offloading.", | ||
| current_omni_platform.get_device_name(), | ||
| ) | ||
| return None |
There was a problem hiding this comment.
I tested cpu offload on NPU and it can work now. Could you please directly remove this check? This feature should be always adapted on every platform :)
| if not current_omni_platform.supports_cpu_offload() or current_omni_platform.get_device_count() < 1: | |
| logger.warning( | |
| "Current device: %s does not support CPU offloading. Skipping offloading.", | |
| current_omni_platform.get_device_name(), | |
| ) | |
| return None |
There was a problem hiding this comment.
@gcanlin @hsliuustc0106 @xuechendi is this regular cpu offloading for layerwise cpu offloading?
There was a problem hiding this comment.
@gcanlin @hsliuustc0106 @xuechendi is this regular cpu offloading for layerwise cpu offloading?
This PR is working on enabling the entry of cpu offloading that includes layerwise.
Layerwise cpu offloading is enabled by #1492.
There was a problem hiding this comment.
so, why we need to have 2 PRs?
There was a problem hiding this comment.
so, why we need to have 2 PRs?
Not really needed. Just open #1492 to enable cpu offload more broadly. @xuechendi Would you mind I directly remove this if condition in my PR, which also enables layerwise offload for other platforms? Feel free to take my PR into this PR.
There was a problem hiding this comment.
Got it, I have rebased #1492 to this PR
Meanwhile, I prefer to keep the option for future HW support, so I move the check to base class with default true
PR Review: [XPU][NPU][RMC] enable cpu_offloading flag for non_cuda (#1488)SummaryThis PR enables CPU offloading capability for non-CUDA platforms (XPU, NPU, ROCm) by introducing a new platform capability method Code Changes AnalysisPositive aspects:
Issues identified: 1. Missing base interface declaration (Critical)The Recommendation: Add to interface.py: @classmethod
def supports_cpu_offload(cls) -> bool:
"""Check if the platform supports CPU offloading for models."""
raise NotImplementedError2. NPU platform returns False without justificationIn
Recommendation: Either:
3. Test coverage doesn't include new platformsThe existing test at @pytest.mark.skipif(current_omni_platform.is_npu() or current_omni_platform.is_rocm(),
reason="Hardware not supported")This contradicts the PR's goal of enabling these platforms. Recommendation: Update the test to:
4. Missing documentationThe PR checklist mentions updating documentation, but no docs were changed. Users need to know:
Recommendation: Update relevant documentation to reflect the new platform support. 5. Incomplete test planThe PR description shows manual testing output for XPU but:
Minor observations:
VerdictThe core idea is sound and the implementation approach is correct, but the PR needs work before merging: Must fix:
Should fix: The PR is on the right track but incomplete in its current state. |
hsliuustc0106
left a comment
There was a problem hiding this comment.
Adding inline comments for specific issues
hsliuustc0106
left a comment
There was a problem hiding this comment.
Inline code review comments
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
hsliuustc0106
left a comment
There was a problem hiding this comment.
Additional inline comments
vllm_omni/platforms/rocm/platform.py
Outdated
| def get_free_memory(cls, device: torch.device | None = None) -> int: | ||
| free, _ = torch.cuda.mem_get_info(device) | ||
| return free | ||
|
|
There was a problem hiding this comment.
ROCm now supports CPU offloading. The test at tests/e2e/offline_inference/test_diffusion_cpu_offload.py:51 should be updated to remove ROCm from the skipif condition.
hsliuustc0106
left a comment
There was a problem hiding this comment.
Review Summary
This PR introduces a good abstraction for CPU offloading support across platforms, but needs several fixes before merging:
Must fix:
- Add
supports_cpu_offload()to the baseOmniPlatforminterface - Clarify NPU support status (enable it or document why it's disabled)
- Update test coverage to match new platform support
Should fix:
4. Update documentation for new platform support
5. Provide complete test results
See inline comments and the detailed review comment for specifics.
|
@xuechendi can you remove the I have run the test locally and it is passing. |
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
1df223c to
1d50daf
Compare
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
@tjtanaa , done, testskip is removed |
|
I realized if we need to enable UT for non-cuda device, this PR is not enough because we also need to generialize GpuMemoryMonitor, I initiate a new PR for this purpose - #1526 @gcanlin @hsliuustc0106 @tjtanaa, I didn't put them in same PR because they are related but not same topic. Please help to check |
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
@hsliuustc0106 , please review again, thanks, PR is rebased |
|
LGTM |
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Current cpu_offloading is only open to CUDA. However, CPU offloading is also very useful feature due to memory capacity issue such as intel arc B60.
This PR aims to provide a new way to decide if certain device should provide cpu-offloading capability or not.
Verified on XPU
After this PR, I can confirm cpu_offloading is effective to XPU
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)