[1/N][port from deepseek085] add custom allreduce from AITER by zejunchen-zejun · Pull Request #629 · ROCm/vllm

zejunchen-zejun · 2025-08-11T08:53:43Z

Sync deepseek085 optimization to rocm/vllm llama branch.
Will upstream same code changes to public vllm.

The custom allreduce is controlled by VLLM_ROCM_USE_AITER_CUSTOM_ALL_REDUCE(default: True).
If AITER is imported, the custom allreduce will be default used.

github-actions · 2025-08-11T08:53:57Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

zejunchen-zejun · 2025-08-12T12:21:20Z

Here is the accuracy verification. Use Qwen/Qwen3-0.6B and gsm8k dataset.
Use CUDA allreduce:

Use AITER allreduce(export VLLM_ROCM_USE_AITER_CUSTOM_ALL_REDUCE=1):

No accuracy degradation is seen.

Verify command:

export VLLM_ROCM_USE_AITER_CUSTOM_ALL_REDUCE=1

#!/bin/bash
rm -rf /root/.cache/vllm
export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

VLLM_USE_V1=1 \
VLLM_ROCM_USE_AITER=1 \
SAFETENSORS_FAST_GPU=1 \
lm_eval --model vllm --model_args pretrained=Qwen/Qwen3-0.6B,tensor_parallel_size=8,max_model_len=10000,gpu_memory_utilization=0.2 --trust_remote_code --tasks gsm8k --batch_size auto 2>&1 | tee ./pr_gsm8k-Qwen_Qwen3-32B-aiter-v1-3.log

control it by the env flag VLLM_ROCM_USE_AITER_CUSTOM_ALL_REDUCE (default: True) Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

tjtanaavllm · 2025-08-12T14:30:56Z

LGTM @zejunchen-zejun Thank you for the feature.

zejunchen-zejun requested review from charlifu, gshtras, hongxiayang, maleksan85, mawong-amd, shajrawi and sunway513 as code owners August 11, 2025 08:53

zejunchen-zejun changed the title ~~[1/N] add custom allreduce from AITER to vllm~~ [1/N][port back from deepseek085] add custom allreduce from AITER to vllm Aug 11, 2025

zejunchen-zejun force-pushed the zejun/llama_fp8_03122025_custom_allreduce branch from 5ee9cab to d785596 Compare August 11, 2025 12:48

zejunchen-zejun changed the title ~~[1/N][port back from deepseek085] add custom allreduce from AITER to vllm~~ [1/N][port from deepseek085] add custom allreduce from AITER to vllm Aug 11, 2025

zejunchen-zejun changed the title ~~[1/N][port from deepseek085] add custom allreduce from AITER to vllm~~ [1/N][port from deepseek085] add custom allreduce from AITER Aug 11, 2025

zejunchen-zejun force-pushed the zejun/llama_fp8_03122025_custom_allreduce branch from d785596 to 555aff3 Compare August 12, 2025 07:53

add custom allreduce from AITER to vllm and

37e91e2

control it by the env flag VLLM_ROCM_USE_AITER_CUSTOM_ALL_REDUCE (default: True) Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

zejunchen-zejun force-pushed the zejun/llama_fp8_03122025_custom_allreduce branch from 555aff3 to 37e91e2 Compare August 12, 2025 12:37

tjtanaavllm merged commit e2fa100 into llama_fp8_03122025 Aug 12, 2025
5 of 6 checks passed

gshtras deleted the zejun/llama_fp8_03122025_custom_allreduce branch September 25, 2025 14:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1/N][port from deepseek085] add custom allreduce from AITER#629

[1/N][port from deepseek085] add custom allreduce from AITER#629
tjtanaavllm merged 1 commit intollama_fp8_03122025from
zejun/llama_fp8_03122025_custom_allreduce

zejunchen-zejun commented Aug 11, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

zejunchen-zejun commented Aug 12, 2025 •

edited

Loading

Uh oh!

tjtanaavllm commented Aug 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zejunchen-zejun commented Aug 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

zejunchen-zejun commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjtanaavllm commented Aug 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zejunchen-zejun commented Aug 11, 2025 •

edited by github-actions bot

Loading

zejunchen-zejun commented Aug 12, 2025 •

edited

Loading