enable sequence parallel for rocm #790

zhuyuhua-v · 2025-11-05T04:01:58Z

Purpose

add sequence parallel support for rocm platforms:

support pattern replacement of allreduce+rmsnorm with reduce_scatter+rmsnrom+all_gather
support pattern replacement of allreduce+rmsnorm+quant with reduce_scatter+rmsnrom+quant+all_gather

Test Plan

server:

vllm serve $model_path \
    --tensor-parallel-size 8 \
    --max-num-batched-tokens 32768 \
    --trust-remote-code \
    --no-enable-prefix-caching \
    --disable-log-requests \
    --gpu_memory_utilization 0.9 \
    --port 6789 \
    --compilation-config '{"cudagraph_mode": "FULL", "pass_config": {"enable_sequence_parallelism": true}, "use_inductor_graph_partition": false, "splitting_ops":[]}' \
    --block-size 1 \
    --async-scheduling

accuracy:

model="/mnt/raid0/models/DeepSeek-R1/"
lm_eval \
--model local-completions \
--tasks gsm8k \
--model_args model=${model_path},base_url=http://127.0.0.1:6789/v1/completions \
--batch_size 100

Test Result

accuracy:

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.9477	±	0.0061
		strict-match	5	exact_match	↑	0.9454	±	0.0063

Signed-off-by: zhuyuhua-v <[email protected]>

wuhuikx · 2025-11-05T05:57:32Z

Will you file a PR to upstream? How about the performance, especially for TTFT?

Signed-off-by: zhuyuhua-v <[email protected]>

zhuyuhua-v added 2 commits November 5, 2025 04:00

enable sequence parallel for rocm

3032a18

Signed-off-by: zhuyuhua-v <[email protected]>

add support for gfx950

d27694f

Signed-off-by: zhuyuhua-v <[email protected]>

zhuyuhua-v marked this pull request as ready for review November 5, 2025 05:50

zhuyuhua-v requested review from kliuae-amd, tjtanaavllm, wuhuikx and zejunchen-zejun as code owners November 5, 2025 05:50

zhuyuhua-v requested a review from ganyi1996ppo November 5, 2025 05:50

bypass torch.dist.all_gather

93762f1

Signed-off-by: zhuyuhua-v <[email protected]>

gbyu-amd mentioned this pull request Nov 12, 2025

[ROCm] add Qwen3 235B recipe vllm-project/recipes#122

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enable sequence parallel for rocm #790

enable sequence parallel for rocm #790

Uh oh!

zhuyuhua-v commented Nov 5, 2025 •

edited by github-actions bot

Loading

Uh oh!

wuhuikx commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

enable sequence parallel for rocm #790

Are you sure you want to change the base?

enable sequence parallel for rocm #790

Uh oh!

Conversation

zhuyuhua-v commented Nov 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

wuhuikx commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhuyuhua-v commented Nov 5, 2025 •

edited by github-actions bot

Loading