Revert "[Bugfix][eager][oom] fix rank0 load imbalance by no padding when multi dp" by coder-fny · Pull Request #7637 · vllm-project/vllm-ascend

coder-fny · 2026-03-25T09:22:10Z

Reverts #7297

vLLM version: v0.18.0
vLLM main: vllm-project/vllm@ed359c4

…hen mult…" This reverts commit 9976e68.

gemini-code-assist · 2026-03-25T09:22:34Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request fully reverts a previous change (#7297) that aimed to address a rank0 load imbalance and Out-Of-Memory (OOM) issues in eager mode by modifying data parallel padding behavior. The revert restores the original padding strategy, where batches are padded to the maximum number of tokens across data parallel ranks, and simplifies the MoE communication type selection process.

Highlights

Revert of Padding Logic: The core change is the reversion of modifications related to data parallel padding. The system now re-enables padding to the maximum number of tokens across data parallel ranks, undoing a previous attempt to avoid padding for load balancing.
MoE Communication Type Selection: The logic for selecting the Mixture-of-Experts (MoE) communication type has been reverted to use the local num_tokens directly, rather than the maximum number of tokens across data parallel ranks.
Removal of Eager Mode Enforcement Parameter: The force_eager parameter has been removed from the model's forward pass execution, simplifying the call signature and removing explicit eager mode enforcement at that point.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request refactors data parallel (DP) padding logic by removing the allow_dp_padding parameter and simplifying token synchronization in _sync_batch_across_dp. It also modifies the ascend_forward_context to directly use num_tokens for selecting the MoE communication method. However, a critical issue was identified where using num_tokens directly for select_moe_comm_method could lead to desynchronization across DP ranks if num_tokens is not consistent, potentially causing a hang. The original logic for max_num_tokens should be restored to ensure a synchronized token count.

gemini-code-assist · 2026-03-25T09:25:43Z

vllm_ascend/ascend_forward_context.py

-        max_num_tokens = int(num_tokens_across_dp.max().item()) if num_tokens_across_dp is not None else num_tokens
-        moe_comm_type = select_moe_comm_method(max_num_tokens, vllm_config, is_draft_model)
-
+        moe_comm_type = select_moe_comm_method(num_tokens, vllm_config, is_draft_model)


This change removes the logic for determining the maximum number of tokens across data parallel (DP) ranks. It now relies on the num_tokens argument to be consistent across all DP ranks.

However, num_tokens may not be consistent. Specifically, in NPUModelRunner._sync_batch_across_dp, the all_reduce operation is skipped if _skip_all_reduce_across_dp_group() returns true (e.g., for non-MoE models or certain MoE configurations). In this case, num_tokens passed to this function will be the local token count for each rank, which can be different.

This will cause select_moe_comm_method to be called with different num_tokens values on different DP ranks, potentially leading to desynchronization and a hang if they choose different communication methods. This is a critical issue.

While the previous logic was also affected by the issue in _skip_all_reduce_across_dp_group, it correctly showed the intent of using a synchronized maximum token count. This change removes that safeguard.

Suggested change

moe_comm_type = select_moe_comm_method(num_tokens, vllm_config, is_draft_model)

max_num_tokens = int(num_tokens_across_dp.max().item()) if num_tokens_across_dp is not None else num_tokens

moe_comm_type = select_moe_comm_method(max_num_tokens, vllm_config, is_draft_model)

github-actions · 2026-03-25T09:40:18Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2026-03-25T15:34:30Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Revert "[Bugfix][eager][oom] fix rank0 load imbalance by no padding w…

2f50049

…hen mult…" This reverts commit 9976e68.

coder-fny requested review from MengqingCao and wangxiyuan as code owners March 25, 2026 09:22

gemini-code-assist bot reviewed Mar 25, 2026

View reviewed changes

github-actions bot added the module:core label Mar 25, 2026

github-actions bot added the merge-conflicts label Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "[Bugfix][eager][oom] fix rank0 load imbalance by no padding when multi dp"#7637

Revert "[Bugfix][eager][oom] fix rank0 load imbalance by no padding when multi dp"#7637
coder-fny wants to merge 1 commit intovllm-project:mainfrom
coder-fny:revert-7297-fix-rank0-load-imbalance

coder-fny commented Mar 25, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot commented Mar 25, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	moe_comm_type = select_moe_comm_method(num_tokens, vllm_config, is_draft_model)
	max_num_tokens = int(num_tokens_across_dp.max().item()) if num_tokens_across_dp is not None else num_tokens
	moe_comm_type = select_moe_comm_method(max_num_tokens, vllm_config, is_draft_model)

Conversation

coder-fny commented Mar 25, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Mar 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coder-fny commented Mar 25, 2026 •

edited by github-actions bot

Loading