-
-
Notifications
You must be signed in to change notification settings - Fork 9.8k
[XPU] Add xpu torch.compile support #22609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XPU] Add xpu torch.compile support #22609
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds torch.compile
support for the XPU platform. The changes correctly bypass CUDA graph capture and functionalization passes which are not yet supported on XPU. However, there is a critical issue where the XPU platform is configured to use CUDAPiecewiseBackend
, which contains CUDA-specific API calls for graph capture. This will lead to runtime errors on XPU. I've provided a comment with a suggested fix.
27c1a28
to
16611f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice & small! A few comments
# XPU does not support auto-functionalization yet. | ||
# Will enable this when switch to vllm-xpu-kernels. | ||
if current_platform.is_xpu(): | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused: what does it mean that XPU does not support autofunctionalization? And if so we should disable it at the pass level, not inside the loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my understanding is autofunctionalization pass is pretty special, not like some rule/op base pass, this will always be called in current code path. I feel it will be strange to change that.
I move this part to eariler. please take a review again.
This pull request has merge conflicts that must be resolved before it can be |
16611f9
to
e3ebfdb
Compare
a6bb017
to
061c150
Compare
vllm/platforms/xpu.py
Outdated
def get_global_graph_pool(self) -> Any: | ||
""" | ||
Currently xpu does NOT support Graph model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just saying we don't support cudagraphs on xpu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, torch-xpu will add this feature in the future.
VLLM_USE_V1=1 python3 examples/offline_inference/basic/generate.py --model facebook/opt-125m --block-size 64 --enforce-eager | ||
VLLM_USE_V1=1 python3 examples/offline_inference/basic/generate.py --model facebook/opt-125m --block-size 64 --enforce-eager -O3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is --enforce-eager -O3 ?
Can we do -O3 with use_cudagraph=False ? (or whatever the new way to disable cudagraphs is?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--enforce-eager -O3
this will use piecewise cuda graph compiler backend, but will not capture cuda graph on cuda device.(I am not certain about current cuda behavior, but it does work on cuda 2 or 3 month ago)
I think in vllm --enforce-eager
equals to not use cuda graph
, we don't expose use_cudagraph
as an vllm arg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a separate cudagraph arg. Now we should just use -O.cudagraph_mode=NONE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in vLLM --enforce-eager
is supposed to mean "disable compile and cudagraphs", but it is not there yet. I don't want to add things like --enforce-eager -O3
that need to be updated later, we should use -O.cudagraph_mode=NONE
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense! please take a look again. thanks!
113bd0c
to
ddd14c5
Compare
how about if just change CompilationLevel from piecewise to dynamo in xpu.py , then we don't need to add additional xpu check in these passes? as all passes in VllmBackend not supported in xpu? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should change config so that if cudagraphs are accidentaly enabled, we warn and disable them, unless this already happens?
vllm/platforms/xpu.py
Outdated
""" | ||
Currently xpu does NOT support Graph model. | ||
""" | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unsafe, it'll break in an ugly way if someone enables cudagraphs. Have we tested this? At least we should raise an error here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we raise en error here, it will break the code path:( I prefer to log some warning here.
I agree it can be unsafe in the future. For now, graph_pool
will not be used if cudagraph_mode
is None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we raise en error here, it will break the code path
Could you show me the error? Why does get_global_graph_pool
get called at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see now, the graph pool handle is passed around the backend, for no good reason. static_graph_wrapper_cls
doesn'r need it at all. Let me fix that and unblock you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AI minion working in #23385
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for fixing! will rebase&fix here once merged.
it will always set |
4b1c7ef
to
d36b5d4
Compare
@ProExpertProg , may you help to approve the PR if the latest fixing has resolved your comment, thanks so much |
Let's wait for #23385 and remove the get_global_graph_pool, then we can approve and merge. I'll unblock CI so we can be ready to merge. |
@xuechendi @jikunshang #23385 has just merged, please remove the |
vllm/attention/layer.py
Outdated
@@ -190,7 +190,7 @@ def __init__( | |||
# opaque custom op. For other platforms, we directly call them | |||
# and let torch.compile handle them. | |||
self.use_direct_call = not current_platform.is_cuda_alike( | |||
) and not current_platform.is_cpu() | |||
) and not current_platform.is_cpu() and not current_platform.is_xpu() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make this a new property on the platform interface, called opaque_attention_op
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, added.
This pull request has merge conflicts that must be resolved before it can be |
d36b5d4
to
ca400b8
Compare
vllm/platforms/xpu.py
Outdated
@@ -182,3 +175,13 @@ def check_if_supports_dtype(cls, torch_dtype: torch.dtype): | |||
"Intel Arc A770 have bfloat16 accuracy known issue. " | |||
"You can use float16 instead by explicitly setting the " | |||
"`dtype` flag in CLI, for example: --dtype=half.") | |||
|
|||
def get_global_graph_pool(self) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please still remove this method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
b69a0f9
to
8ca13d9
Compare
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
This PR enable torch compile on XPU platform. user can enable with
-O3
optionlimitations:
Test Plan
add a test in CI.
Test Result
(Optional) Documentation Update