Skip to content

[PERF]use mindiesd fused rope and rope cache#2571

Open
Hu1Lcode wants to merge 7 commits intovllm-project:mainfrom
Hu1Lcode:main
Open

[PERF]use mindiesd fused rope and rope cache#2571
Hu1Lcode wants to merge 7 commits intovllm-project:mainfrom
Hu1Lcode:main

Conversation

@Hu1Lcode
Copy link
Copy Markdown
Contributor

@Hu1Lcode Hu1Lcode commented Apr 8, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

commit 1 : use mindiesd fused rope
commit 2 : The calculation results of the rope are cached in the forward direction of the model for storage and subsequent use.

Test Plan

server
vllm serve wan2.2-i2v-diffusers
--omni
--port 8099
--usp 8
--use-hsdp
--vae-patch-parallel-size 8
--vae-use-tiling
--log-stats
--profiler-config '{"profiler": "torch", "torch_profiler_dir": "./vllm_profile"}'
curl
curl -X POST http://localhost:8099/v1/videos
-F "prompt=**********"
-F "input_reference=image"
-F "size=720x1280"
-F "fps=24"
-F "num_frames=121"
-F "guidance_scale=1.0"
-F "flow_shift=5.0"
-F "num_inference_steps=4"
-F "seed=42"

Test Result

commit 1
Implementing the small operator by replacing the rope with the fusion operator.
image
commit 2
step 1
image
step 2
image


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Hu1Lcode added 2 commits April 8, 2026 09:40
Signed-off-by: Hui <1779066624@qq.com>
Signed-off-by: Hui <1779066624@qq.com>
@Hu1Lcode Hu1Lcode requested a review from hsliuustc0106 as a code owner April 8, 2026 01:46
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Hu1Lcode and others added 3 commits April 8, 2026 09:48
Signed-off-by: Hui <1779066624@qq.com>
Signed-off-by: Hui <1779066624@qq.com>
@Hu1Lcode Hu1Lcode changed the title use mindiesd fused rope and use rope cache [PERF]use mindiesd fused rope and use rope cache Apr 8, 2026
@Hu1Lcode Hu1Lcode changed the title [PERF]use mindiesd fused rope and use rope cache [PERF]use mindiesd fused rope and rope cache Apr 9, 2026
Copy link
Copy Markdown
Collaborator

@david6666666 david6666666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one non-blocking performance note on the fused RoPE path.

cos = freqs_cos[..., 0::2]
sin = freqs_sin[..., 1::2]

if find_spec("mindiesd"):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: apply_rotary_emb_wan() is on the hot path, but this adds both find_spec("mindiesd") and logger.info(...) inside every call. Even if the fused kernel is faster, repeatedly doing module discovery and emitting an info log per layer / per denoise step can eat into the gain. I would move the capability check to module scope and downgrade the log to debug or warning_once.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove this log because it already exists in mindie.

Signed-off-by: Hui <1779066624@qq.com>
sin = freqs_sin[..., 1::2]

if find_spec("mindiesd"):
from vllm_omni.diffusion.layers.rope import apply_rotary_emb_mindiesd
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe platform-related optimizations shouldn't be placed in the model script. You can contact @gcanlin to see how to optimize it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants