-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
[XPU] Add xpu torch.compile support #22609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[XPU] Add xpu torch.compile support #22609
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds torch.compile
support for the XPU platform. The changes correctly bypass CUDA graph capture and functionalization passes which are not yet supported on XPU. However, there is a critical issue where the XPU platform is configured to use CUDAPiecewiseBackend
, which contains CUDA-specific API calls for graph capture. This will lead to runtime errors on XPU. I've provided a comment with a suggested fix.
@classmethod | ||
def get_piecewise_backend_cls(cls) -> str: | ||
return "vllm.compilation.cuda_piecewise_backend.CUDAPiecewiseBackend" # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using CUDAPiecewiseBackend
for XPU is problematic as it contains CUDA-specific code (e.g., torch.cuda.CUDAGraph
) that will fail on XPU platforms.
The PR description mentions that XPU does not support graph mode yet, which suggests that graph capture should be disabled. However, compilation_config.use_cudagraph
is enabled by default for the V1 engine and is not disabled for the XPU platform. This will cause CUDAPiecewiseBackend
to attempt CUDA graph capture, leading to a runtime error.
To fix this, you should disable CUDA graph capture for the XPU platform within torch.compile
. A possible fix is to add vllm_config.compilation_config.use_cudagraph = False
to the XPUPlatform.check_and_update_config
method. Alternatively, you could create a new XPUPiecewiseBackend
that does not contain CUDA-specific graph capture logic and use it here.
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
This PR enable torch compile on XPU platform. user can enable with
-O3
optionlimitations:
Test Plan
add a test in CI.
Test Result
(Optional) Documentation Update