Skip to content

Conversation

Pr0Wh1teGivee
Copy link
Contributor

@Pr0Wh1teGivee Pr0Wh1teGivee commented Sep 30, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix an allgather error. The main change involves applying torch.abs() to expanded_row_idx within the token_combine method of TokenDispatcherWithAllGather. While this might resolve an error from negative indices, it raises a concern about potential data corruption if negative values have a special meaning, such as indicating padding tokens. My review includes a critical comment on this change, recommending clarification and suggesting a safer handling mechanism like masking, along with a corresponding unit test. The PR also includes a cleanup in TokenDispatcherWithAll2AllV by removing unused logic related to global redundant experts, which appears to be a correct improvement.

bias: torch.Tensor = None):
assert self.original_shape is not None

self.expanded_row_idx = torch.abs(self.expanded_row_idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Using torch.abs() on indices can be risky and might hide underlying issues. If self.expanded_row_idx contains negative values to signify special tokens (e.g., padding tokens with index -1), taking the absolute value will map them to valid indices (e.g., index 1), which could lead to data corruption by incorrectly mixing token data.

Could you please clarify the reason for expanded_row_idx containing negative values?

If negative values are indeed used for padding or invalid tokens, it would be safer to handle them with masking rather than torch.abs().

Additionally, it would be beneficial to add a unit test case with negative values in expanded_row_idx to verify that this change behaves as expected and prevents regressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant