[XPU] Add xpu torch.compile support #22609

jikunshang · 2025-08-11T01:57:29Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

This PR enable torch compile on XPU platform. user can enable with -O3 option
limitations:

due to xpu still use ipex kernels for now, custom ops are not register. all custom ops (except attention) are not enabled, will use torch native impl. we will improve this with vllm-xpu-kernels. meanwhile, almost all custom passes are not supported.
xpu not support graph mode yet. so bypass graph capture, which means we still need --enforce-eager on xpu platform.

Test Plan

add a test in CI.

Test Result

(Optional) Documentation Update

github-actions · 2025-08-11T01:57:37Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request adds torch.compile support for the XPU platform. The changes correctly bypass CUDA graph capture and functionalization passes which are not yet supported on XPU. However, there is a critical issue where the XPU platform is configured to use CUDAPiecewiseBackend, which contains CUDA-specific API calls for graph capture. This will lead to runtime errors on XPU. I've provided a comment with a suggested fix.

vllm/platforms/xpu.py

vllm/compilation/backends.py

ProExpertProg

Nice & small! A few comments

ProExpertProg · 2025-08-15T17:24:20Z

vllm/compilation/fix_functionalization.py

+            # XPU does not support auto-functionalization yet.
+            # Will enable this when switch to vllm-xpu-kernels.
+            if current_platform.is_xpu():
+                continue


I'm confused: what does it mean that XPU does not support autofunctionalization? And if so we should disable it at the pass level, not inside the loop

my understanding is autofunctionalization pass is pretty special, not like some rule/op base pass, this will always be called in current code path. I feel it will be strange to change that.
I move this part to eariler. please take a review again.

vllm/platforms/xpu.py

mergify · 2025-08-15T17:25:59Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jikunshang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

zou3519 · 2025-08-19T03:38:58Z

vllm/platforms/xpu.py

+    def get_global_graph_pool(self) -> Any:
+        """
+        Currently xpu does NOT support Graph model.


is this just saying we don't support cudagraphs on xpu?

yes, torch-xpu will add this feature in the future.

vllm/compilation/fix_functionalization.py

zou3519 · 2025-08-19T03:40:25Z

.buildkite/scripts/hardware_ci/run-xpu-test.sh

    VLLM_USE_V1=1 python3 examples/offline_inference/basic/generate.py --model facebook/opt-125m --block-size 64 --enforce-eager
+    VLLM_USE_V1=1 python3 examples/offline_inference/basic/generate.py --model facebook/opt-125m --block-size 64 --enforce-eager -O3


What is --enforce-eager -O3 ?

Can we do -O3 with use_cudagraph=False ? (or whatever the new way to disable cudagraphs is?)

--enforce-eager -O3 this will use piecewise cuda graph compiler backend, but will not capture cuda graph on cuda device.(I am not certain about current cuda behavior, but it does work on cuda 2 or 3 month ago)
I think in vllm --enforce-eager equals to not use cuda graph, we don't expose use_cudagraph as an vllm arg.

There's a separate cudagraph arg. Now we should just use -O.cudagraph_mode=NONE

in vLLM --enforce-eager is supposed to mean "disable compile and cudagraphs", but it is not there yet. I don't want to add things like --enforce-eager -O3 that need to be updated later, we should use -O.cudagraph_mode=NONE instead.

make sense! please take a look again. thanks!

chaojun-zhang · 2025-08-20T03:03:12Z

how about if just change CompilationLevel from piecewise to dynamo in xpu.py , then we don't need to add additional xpu check in these passes? as all passes in VllmBackend not supported in xpu?

ProExpertProg

We should change config so that if cudagraphs are accidentaly enabled, we warn and disable them, unless this already happens?

ProExpertProg · 2025-08-20T15:15:27Z

vllm/platforms/xpu.py

+        """
+        Currently xpu does NOT support Graph model.
+        """
+        return None


This is unsafe, it'll break in an ugly way if someone enables cudagraphs. Have we tested this? At least we should raise an error here

If we raise en error here, it will break the code path:( I prefer to log some warning here.
I agree it can be unsafe in the future. For now, graph_pool will not be used if cudagraph_mode is None.

If we raise en error here, it will break the code path

Could you show me the error? Why does get_global_graph_pool get called at all?

Oh I see now, the graph pool handle is passed around the backend, for no good reason. static_graph_wrapper_cls doesn'r need it at all. Let me fix that and unblock you

AI minion working in #23385

thanks for fixing! will rebase&fix here once merged.

jikunshang · 2025-08-21T01:02:06Z

We should change config so that if cudagraphs are accidentaly enabled, we warn and disable them, unless this already happens?

it will always set cudagraph_mode to None on xpu. see here

xuechendi · 2025-08-25T14:05:47Z

@ProExpertProg , may you help to approve the PR if the latest fixing has resolved your comment, thanks so much

ProExpertProg · 2025-08-25T14:37:45Z

Let's wait for #23385 and remove the get_global_graph_pool, then we can approve and merge. I'll unblock CI so we can be ready to merge.

ProExpertProg · 2025-08-26T02:37:03Z

@xuechendi @jikunshang #23385 has just merged, please remove the get_global_graph_pool fro XPU platform, after that we should be good to merge this PR as well!

ProExpertProg · 2025-08-26T02:41:10Z

vllm/attention/layer.py

@@ -190,7 +190,7 @@ def __init__(
        # opaque custom op. For other platforms, we directly call them
        # and let torch.compile handle them.
        self.use_direct_call = not current_platform.is_cuda_alike(
-        ) and not current_platform.is_cpu()
+        ) and not current_platform.is_cpu() and not current_platform.is_xpu()


Could you make this a new property on the platform interface, called opaque_attention_op?

sure, added.

mergify · 2025-08-26T02:41:44Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jikunshang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

ProExpertProg · 2025-08-26T15:37:35Z

vllm/platforms/xpu.py

@@ -182,3 +175,13 @@ def check_if_supports_dtype(cls, torch_dtype: torch.dtype):
                    "Intel Arc A770 have bfloat16 accuracy known issue. "
                    "You can use float16 instead by explicitly setting the "
                    "`dtype` flag in CLI, for example: --dtype=half.")
+
+    def get_global_graph_pool(self) -> Any:


Please still remove this method

Signed-off-by: Kunshang Ji <[email protected]>

Signed-off-by: Kunshang Ji <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

Signed-off-by: Kunshang Ji <[email protected]>

jikunshang requested review from zou3519, youkaichao and ProExpertProg as code owners August 11, 2025 01:57

mergify bot added the ci/build label Aug 11, 2025

gemini-code-assist bot reviewed Aug 11, 2025

View reviewed changes

vllm/platforms/xpu.py Outdated Show resolved Hide resolved

jikunshang force-pushed the kunshang/t_compile_support branch from 27c1a28 to 16611f9 Compare August 15, 2025 00:50

hmellor reviewed Aug 15, 2025

View reviewed changes

vllm/compilation/backends.py Outdated Show resolved Hide resolved

ProExpertProg requested changes Aug 15, 2025

View reviewed changes

mergify bot added the needs-rebase label Aug 15, 2025

jikunshang force-pushed the kunshang/t_compile_support branch from 16611f9 to e3ebfdb Compare August 16, 2025 02:15

mergify bot removed the needs-rebase label Aug 16, 2025

jikunshang force-pushed the kunshang/t_compile_support branch from a6bb017 to 061c150 Compare August 19, 2025 01:58

jikunshang requested review from ProExpertProg and hmellor August 19, 2025 02:00

zou3519 reviewed Aug 19, 2025

View reviewed changes

vllm/compilation/fix_functionalization.py Outdated Show resolved Hide resolved

zou3519 reviewed Aug 19, 2025

View reviewed changes

jikunshang force-pushed the kunshang/t_compile_support branch from 113bd0c to ddd14c5 Compare August 20, 2025 00:57

zou3519 approved these changes Aug 20, 2025

View reviewed changes

ProExpertProg requested changes Aug 20, 2025

View reviewed changes

jikunshang force-pushed the kunshang/t_compile_support branch from 4b1c7ef to d36b5d4 Compare August 21, 2025 01:32

ProExpertProg reviewed Aug 26, 2025

View reviewed changes

mergify bot added the needs-rebase label Aug 26, 2025

jikunshang force-pushed the kunshang/t_compile_support branch from d36b5d4 to ca400b8 Compare August 26, 2025 03:13

mergify bot removed the needs-rebase label Aug 26, 2025

jikunshang requested a review from bigPYJ1151 as a code owner August 26, 2025 03:22

mergify bot added the rocm Related to AMD ROCm label Aug 26, 2025

ProExpertProg reviewed Aug 26, 2025

View reviewed changes

jikunshang force-pushed the kunshang/t_compile_support branch from b69a0f9 to 8ca13d9 Compare August 27, 2025 01:48

ProExpertProg approved these changes Aug 27, 2025

View reviewed changes

ProExpertProg enabled auto-merge (squash) August 27, 2025 03:16

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 27, 2025

ProExpertProg merged commit fce10db into vllm-project:main Aug 27, 2025
48 checks passed

jikunshang added 9 commits August 27, 2025 17:15

add xpu torch.compile support

41c6265

Signed-off-by: Kunshang Ji <[email protected]>

add compile test

dddafda

Signed-off-by: Kunshang Ji <[email protected]>

address comments

e1c6d59

Signed-off-by: Kunshang Ji <[email protected]>

address comments

4ea4df1

Signed-off-by: Kunshang Ji <[email protected]>

use cudagraph_mode=NONE instead of enforce-eager in ci script

3e58807

Signed-off-by: Kunshang Ji <[email protected]>

add log, remove a check

1ac5bab

Signed-off-by: Kunshang Ji <[email protected]>

raise error when get graph pool on XPU

6f0d085

Signed-off-by: Kunshang Ji <[email protected]>

add opaque_attention_op interface in platform

8d696b8

Signed-off-by: Kunshang Ji <[email protected]>

remove get_global_graph_pool

8ca13d9

Signed-off-by: Kunshang Ji <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[XPU] Add xpu torch.compile support (vllm-project#22609)

e441234

Signed-off-by: Kunshang Ji <[email protected]>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

[XPU] Add xpu torch.compile support (vllm-project#22609)

d2195e0

Signed-off-by: Kunshang Ji <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

[XPU] Add xpu torch.compile support (vllm-project#22609)

c563b5e

Signed-off-by: Kunshang Ji <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[XPU] Add xpu torch.compile support (vllm-project#22609)

61be621

Signed-off-by: Kunshang Ji <[email protected]>

dumb0002 pushed a commit to dumb0002/vllm that referenced this pull request Aug 28, 2025

[XPU] Add xpu torch.compile support (vllm-project#22609)

f3df634

Signed-off-by: Kunshang Ji <[email protected]>

2015aroras pushed a commit to 2015aroras/vllm that referenced this pull request Aug 29, 2025

[XPU] Add xpu torch.compile support (vllm-project#22609)

32fe04e

Signed-off-by: Kunshang Ji <[email protected]>

		VLLM_USE_V1=1 python3 examples/offline_inference/basic/generate.py --model facebook/opt-125m --block-size 64 --enforce-eager
		VLLM_USE_V1=1 python3 examples/offline_inference/basic/generate.py --model facebook/opt-125m --block-size 64 --enforce-eager -O3

Uh oh!

[XPU] Add xpu torch.compile support #22609

[XPU] Add xpu torch.compile support #22609

Uh oh!

Conversation

jikunshang commented Aug 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Aug 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chaojun-zhang commented Aug 20, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jikunshang commented Aug 21, 2025

Uh oh!

xuechendi commented Aug 25, 2025

Uh oh!

ProExpertProg commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProExpertProg commented Aug 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Aug 26, 2025

Uh oh!

jikunshang commented Aug 11, 2025 •

edited by github-actions bot

Loading

ProExpertProg Aug 21, 2025 •

edited

Loading

ProExpertProg commented Aug 25, 2025 •

edited

Loading