[PERF]use mindiesd fused rope and rope cache#2571
[PERF]use mindiesd fused rope and rope cache#2571Hu1Lcode wants to merge 7 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Hui <1779066624@qq.com>
Signed-off-by: Hui <1779066624@qq.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Signed-off-by: Hui <1779066624@qq.com>
Signed-off-by: Hui <1779066624@qq.com>
david6666666
left a comment
There was a problem hiding this comment.
I left one non-blocking performance note on the fused RoPE path.
| cos = freqs_cos[..., 0::2] | ||
| sin = freqs_sin[..., 1::2] | ||
|
|
||
| if find_spec("mindiesd"): |
There was a problem hiding this comment.
Non-blocking: apply_rotary_emb_wan() is on the hot path, but this adds both find_spec("mindiesd") and logger.info(...) inside every call. Even if the fused kernel is faster, repeatedly doing module discovery and emitting an info log per layer / per denoise step can eat into the gain. I would move the capability check to module scope and downgrade the log to debug or warning_once.
There was a problem hiding this comment.
I will remove this log because it already exists in mindie.
Signed-off-by: Hui <1779066624@qq.com>
| sin = freqs_sin[..., 1::2] | ||
|
|
||
| if find_spec("mindiesd"): | ||
| from vllm_omni.diffusion.layers.rope import apply_rotary_emb_mindiesd |
There was a problem hiding this comment.
I believe platform-related optimizations shouldn't be placed in the model script. You can contact @gcanlin to see how to optimize it.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
commit 1 : use mindiesd fused rope
commit 2 : The calculation results of the rope are cached in the forward direction of the model for storage and subsequent use.
Test Plan
server
vllm serve wan2.2-i2v-diffusers
--omni
--port 8099
--usp 8
--use-hsdp
--vae-patch-parallel-size 8
--vae-use-tiling
--log-stats
--profiler-config '{"profiler": "torch", "torch_profiler_dir": "./vllm_profile"}'
curl
curl -X POST http://localhost:8099/v1/videos
-F "prompt=**********"
-F "input_reference=image"
-F "size=720x1280"
-F "fps=24"
-F "num_frames=121"
-F "guidance_scale=1.0"
-F "flow_shift=5.0"
-F "num_inference_steps=4"
-F "seed=42"
Test Result
commit 1



Implementing the small operator by replacing the rope with the fusion operator.
commit 2
step 1
step 2
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)