Fix allgather error #3297

Pr0Wh1teGivee · 2025-09-30T08:03:18Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.0rc3
vLLM main: vllm-project/vllm@releases/v0.11.0

github-actions · 2025-09-30T08:03:27Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request aims to fix an allgather error. The main change involves applying torch.abs() to expanded_row_idx within the token_combine method of TokenDispatcherWithAllGather. While this might resolve an error from negative indices, it raises a concern about potential data corruption if negative values have a special meaning, such as indicating padding tokens. My review includes a critical comment on this change, recommending clarification and suggesting a safer handling mechanism like masking, along with a corresponding unit test. The PR also includes a cleanup in TokenDispatcherWithAll2AllV by removing unused logic related to global redundant experts, which appears to be a correct improvement.

gemini-code-assist · 2025-09-30T08:04:32Z

vllm_ascend/ops/moe/token_dispatcher.py

                      bias: torch.Tensor = None):
        assert self.original_shape is not None
+
+        self.expanded_row_idx = torch.abs(self.expanded_row_idx)


Using torch.abs() on indices can be risky and might hide underlying issues. If self.expanded_row_idx contains negative values to signify special tokens (e.g., padding tokens with index -1), taking the absolute value will map them to valid indices (e.g., index 1), which could lead to data corruption by incorrectly mixing token data.

Could you please clarify the reason for expanded_row_idx containing negative values?

If negative values are indeed used for padding or invalid tokens, it would be safer to handle them with masking rather than torch.abs().

Additionally, it would be beneficial to add a unit test case with negative values in expanded_row_idx to verify that this change behaves as expected and prevents regressions.

github-actions bot added module:tests module:ops labels Sep 30, 2025

gemini-code-assist bot reviewed Sep 30, 2025

View reviewed changes

Fix allgather error

035ce61

Pr0Wh1teGivee force-pushed the allgather_fix branch from fc46e54 to 035ce61 Compare September 30, 2025 08:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix allgather error #3297

Fix allgather error #3297

Pr0Wh1teGivee commented Sep 30, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 30, 2025

Uh oh!

Uh oh!

Fix allgather error #3297

Are you sure you want to change the base?

Fix allgather error #3297

Conversation

Pr0Wh1teGivee commented Sep 30, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Pr0Wh1teGivee commented Sep 30, 2025 •

edited by github-actions bot

Loading