[bugfix]change log2phy map to npu #3282

offline893 · 2025-09-30T03:10:01Z

What this PR does / why we need it?

Resolved the issue of EPLB failure caused by changes in the log2phy map due to device type modifications when using MTP rotation position encoding.

Does this PR introduce any user-facing change?

How was this patch tested?

vllm-project/vllm@releases/v0.11.0

vLLM version: v0.11.0rc3
vLLM main: vllm-project/vllm@releases/v0.11.0

[BugFiModify eplb feature guide.

Signed-off-by: offline0806 <[email protected]>

gemini-code-assist

Code Review

This pull request addresses a bug causing EPLB failures by ensuring the log2phy map is on the correct NPU device. The fix is applied consistently across three files where this logic is present. While the fix is correct, I've identified a critical potential bug in the called function determine_default_log2phy_map and a high-severity maintainability issue due to code duplication. Please see my detailed comments.

gemini-code-assist · 2025-09-30T03:12:04Z

vllm_ascend/ops/common_fused_moe.py

                self.log2phy = determine_default_log2phy_map(
                    self.global_num_experts, self.ep_size, self.ep_rank,
-                    self.global_redundant_expert_num)
+                    self.global_redundant_expert_num).npu()


This change correctly moves the log2phy tensor to the NPU device, fixing a device mismatch bug.

However, there are two related points to consider:

Potential Bug in determine_default_log2phy_map: The called function determine_default_log2phy_map in vllm_ascend/eplb/core/eplb_utils.py appears to have a bug. On line 122, it uses rank_id inside a loop that iterates over ranks with variable r (for r in range(world_size):). The condition should likely be r < global_redundant_expert_num instead of rank_id < global_redundant_expert_num. Because expert_map_all is constructed for all ranks within this loop, using rank_id will result in an incorrect map for all ranks r != rank_id. This will cause generate_log2phy_map to compute an incorrect log2phy_map_all, ultimately providing a faulty map for the current rank. This is a critical issue that should be investigated and fixed.

Code Duplication: This same fix is required in three files (vllm_ascend/ops/common_fused_moe.py, vllm_ascend/ops/fused_moe.py, and vllm_ascend/torchair/ops/torchair_fused_moe.py) due to duplicated initialization logic. This code duplication is a maintainability risk, as demonstrated by this bug appearing in multiple places. I recommend refactoring this logic into a shared helper function or a base class method in a follow-up PR to improve maintainability.

Given the critical nature of the potential bug, I recommend addressing it.

gemini-code-assist · 2025-09-30T03:12:04Z

vllm_ascend/ops/fused_moe.py

                self.log2phy = determine_default_log2phy_map(
                    self.global_num_experts, self.ep_size, self.ep_rank,
-                    self.global_redundant_expert_num)
+                    self.global_redundant_expert_num).npu()


This is the same change as in vllm_ascend/ops/common_fused_moe.py. Please see my comment there regarding a potential critical bug in determine_default_log2phy_map and the code duplication issue.

gemini-code-assist · 2025-09-30T03:12:04Z

vllm_ascend/torchair/ops/torchair_fused_moe.py

                self.log2phy = determine_default_log2phy_map(
                    self.global_num_experts, self.ep_size, self.ep_rank,
-                    self.global_redundant_expert_num)
+                    self.global_redundant_expert_num).npu()


This is the same change as in vllm_ascend/ops/common_fused_moe.py. Please see my comment there regarding a potential critical bug in determine_default_log2phy_map and the code duplication issue.

github-actions · 2025-09-30T03:12:17Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

offline0806 added 3 commits September 25, 2025 16:30

Signed-off-by: offline0806 <[email protected]>

546fe39

[BugFiModify eplb feature guide.

[bugfix]change log2phy map to npu.

77d965b

Signed-off-by: offline0806 <[email protected]>

Merge remote-tracking branch 'upstream_gitee/main' into main_0925

9dbbe2d

github-actions bot added the module:ops label Sep 30, 2025

gemini-code-assist bot reviewed Sep 30, 2025

View reviewed changes

[ut]mock torch npu.Signed-off-by: offline0806 <[email protected]>

9d17b84

github-actions bot added the module:tests label Sep 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bugfix]change log2phy map to npu #3282

[bugfix]change log2phy map to npu #3282

offline893 commented Sep 30, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 30, 2025

Uh oh!

gemini-code-assist bot Sep 30, 2025

Uh oh!

gemini-code-assist bot Sep 30, 2025

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

Uh oh!

[bugfix]change log2phy map to npu #3282

Are you sure you want to change the base?

[bugfix]change log2phy map to npu #3282

Conversation

offline893 commented Sep 30, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

Uh oh!

offline893 commented Sep 30, 2025 •

edited by github-actions bot

Loading