[PERF]use mindiesd fused rope and rope cache by Hu1Lcode · Pull Request #2571 · vllm-project/vllm-omni

Hu1Lcode · 2026-04-08T01:46:44Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

commit 1 : use mindiesd fused rope
commit 2 : The calculation results of the rope are cached in the forward direction of the model for storage and subsequent use.

Test Plan

server
vllm serve wan2.2-i2v-diffusers
--omni
--port 8099
--usp 8
--use-hsdp
--vae-patch-parallel-size 8
--vae-use-tiling
--log-stats
--profiler-config '{"profiler": "torch", "torch_profiler_dir": "./vllm_profile"}'
curl
curl -X POST http://localhost:8099/v1/videos
-F "prompt=**********"
-F "input_reference=image"
-F "size=720x1280"
-F "fps=24"
-F "num_frames=121"
-F "guidance_scale=1.0"
-F "flow_shift=5.0"
-F "num_inference_steps=4"
-F "seed=42"

Test Result

commit 1
Implementing the small operator by replacing the rope with the fusion operator.

commit 2
step 1

step 2

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Hui <1779066624@qq.com>

chatgpt-codex-connector · 2026-04-08T01:46:52Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Signed-off-by: Hui <1779066624@qq.com>

david6666666

I left one non-blocking performance note on the fused RoPE path.

david6666666 · 2026-04-09T02:27:26Z

vllm_omni/diffusion/models/wan2_2/wan2_2_transformer.py

    cos = freqs_cos[..., 0::2]
    sin = freqs_sin[..., 1::2]
+
+    if find_spec("mindiesd"):


Non-blocking: apply_rotary_emb_wan() is on the hot path, but this adds both find_spec("mindiesd") and logger.info(...) inside every call. Even if the fused kernel is faster, repeatedly doing module discovery and emitting an info log per layer / per denoise step can eat into the gain. I would move the capability check to module scope and downgrade the log to debug or warning_once.

I will remove this log because it already exists in mindie.

Signed-off-by: Hui <1779066624@qq.com>

david6666666 · 2026-04-09T06:38:46Z

vllm_omni/diffusion/models/wan2_2/wan2_2_transformer.py

    sin = freqs_sin[..., 1::2]
+
+    if find_spec("mindiesd"):
+        from vllm_omni.diffusion.layers.rope import apply_rotary_emb_mindiesd


I believe platform-related optimizations shouldn't be placed in the model script. You can contact @gcanlin to see how to optimize it.

Hu1Lcode added 2 commits April 8, 2026 09:40

use mindiesd fused rope

eb9e37f

Signed-off-by: Hui <1779066624@qq.com>

use rope cache

d021ebb

Signed-off-by: Hui <1779066624@qq.com>

Hu1Lcode requested a review from hsliuustc0106 as a code owner April 8, 2026 01:46

Hu1Lcode and others added 3 commits April 8, 2026 09:48

fix pre-commit

17f007b

Signed-off-by: Hui <1779066624@qq.com>

fix pre-commit

2502cad

Signed-off-by: Hui <1779066624@qq.com>

Merge branch 'vllm-project:main' into main

0fe4649

Hu1Lcode changed the title ~~use mindiesd fused rope and use rope cache~~ [PERF]use mindiesd fused rope and use rope cache Apr 8, 2026

Hu1Lcode changed the title ~~[PERF]use mindiesd fused rope and use rope cache~~ [PERF]use mindiesd fused rope and rope cache Apr 9, 2026

david6666666 mentioned this pull request Apr 9, 2026

[RFC][0.20.0]: Qwen-Image、Qwen-Image-Layered、Qwen-Image-Edit-Plus、Wan2.2 Production-grade Feature Monitoring JiusiServe/vllm-omni#181

Open

32 tasks

david6666666 reviewed Apr 9, 2026

View reviewed changes

Merge branch 'vllm-project:main' into main

60dc8d2

hsliuustc0106 requested review from SamitHuang, ZJY0516 and wtomin April 9, 2026 06:33

remove mindie rope log

ad4cd62

Signed-off-by: Hui <1779066624@qq.com>

david6666666 requested changes Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PERF]use mindiesd fused rope and rope cache#2571

[PERF]use mindiesd fused rope and rope cache#2571
Hu1Lcode wants to merge 7 commits intovllm-project:mainfrom
Hu1Lcode:main

Hu1Lcode commented Apr 8, 2026

Uh oh!

chatgpt-codex-connector bot commented Apr 8, 2026

Uh oh!

david6666666 left a comment

Uh oh!

david6666666 Apr 9, 2026

Uh oh!

Hu1Lcode Apr 9, 2026

Uh oh!

david6666666 Apr 9, 2026

Uh oh!

Hu1Lcode Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Hu1Lcode commented Apr 8, 2026

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot commented Apr 8, 2026

Uh oh!

david6666666 left a comment

Choose a reason for hiding this comment

Uh oh!

david6666666 Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Hu1Lcode Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

david6666666 Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Hu1Lcode Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants