replace arange and permute with the third output of npu_moe_gating_top_k_softmax #2418

loukong33 · 2025-08-18T07:56:14Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.10.0
vLLM main: vllm-project/vllm@5f5664b

gemini-code-assist

Code Review

This pull request introduces an optimization by replacing the manual creation of row_idx using torch.arange and permute with the third output from the npu_moe_gating_top_k_softmax kernel. This is a good performance improvement. The changes are consistently applied across the codebase, including updates to function signatures and tests to accommodate the new row_idx return value. I've identified one potential issue in the control flow logic that could cause a specialized execution path to be incorrectly overridden by a more general one. Please see the detailed comment.

gemini-code-assist · 2025-08-18T07:57:49Z

vllm_ascend/ops/layers/experts_selector.py

@@ -188,12 +188,12 @@ def _select_experts_with_fusion_ops(
            eps=float(1e-20))

    if not use_grouped_topk and custom_routing_function is None and scoring_func == "softmax" and is_unquantized:
-        topk_weights, topk_ids, _ = torch_npu.npu_moe_gating_top_k_softmax(
+        topk_weights, topk_ids, row_idx = torch_npu.npu_moe_gating_top_k_softmax(


This if block can be executed even if the preceding if is_deepseek_v3_r1: block was already executed. This would cause the results from the specialized npu_moe_gating_top_k path to be overwritten by this more general npu_moe_gating_top_k_softmax path, which is likely not the intended behavior. To ensure that only one of these specialized fusion paths is taken, consider changing the if on line 190 to an elif.

github-actions · 2025-08-18T08:10:13Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

…p_k_softmax Signed-off-by: huangxialu <[email protected]>

ApsarasX · 2025-08-18T09:18:25Z

tests/ut/ops/test_fused_ops.py

@@ -400,7 +400,7 @@ def test_select_experts(self, mock_dist_env, mock_moe_env,

        x = torch.randn(8, 2)
        router_logits = torch.randn(8, 2)
-        topk_weights, topk_ids = select_experts(
+        topk_weights, topk_ids, row_idx = select_experts(


Could add a test case to cover the fused_experts function with the row_idx parameter passed in.

ApsarasX · 2025-08-18T09:21:37Z

vllm_ascend/ops/fused_moe.py

-                                    1, 0).contiguous())
+        if row_idx is None:
+            row_idx_len = num_tokens * top_k
+            row_idx = (torch.arange(0,


Seems all layers use the same row_idx, can we just construct it once?

ApsarasX · 2025-08-18T09:27:46Z

Same as #2373 ?

loukong33 · 2025-08-18T09:50:47Z

Same as #2373 ?

Yes. I will close this pr.

gemini-code-assist bot reviewed Aug 18, 2025

View reviewed changes

github-actions bot added module:tests module:ops module:quantization labels Aug 18, 2025

loukong33 force-pushed the row_idx branch from 74b23cb to 60ba6e7 Compare August 18, 2025 08:27

replace arange and permute with the third output of npu_moe_gating_to…

cc915df

…p_k_softmax Signed-off-by: huangxialu <[email protected]>

loukong33 force-pushed the row_idx branch from 60ba6e7 to cc915df Compare August 18, 2025 08:56

loukong33 closed this Aug 18, 2025

loukong33 reopened this Aug 18, 2025

ApsarasX reviewed Aug 18, 2025

View reviewed changes

loukong33 closed this Aug 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

replace arange and permute with the third output of npu_moe_gating_top_k_softmax #2418

replace arange and permute with the third output of npu_moe_gating_top_k_softmax #2418

loukong33 commented Aug 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 18, 2025

Uh oh!

loukong33 Aug 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 18, 2025

Uh oh!

ApsarasX Aug 18, 2025

Uh oh!

ApsarasX Aug 18, 2025

Uh oh!

ApsarasX commented Aug 18, 2025

Uh oh!

loukong33 commented Aug 18, 2025

Uh oh!

Uh oh!

replace arange and permute with the third output of npu_moe_gating_top_k_softmax #2418

replace arange and permute with the third output of npu_moe_gating_top_k_softmax #2418

Conversation

loukong33 commented Aug 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

loukong33 Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 18, 2025

Uh oh!

ApsarasX Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

ApsarasX Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

ApsarasX commented Aug 18, 2025

Uh oh!

loukong33 commented Aug 18, 2025

Uh oh!

Uh oh!

loukong33 commented Aug 18, 2025 •

edited by github-actions bot

Loading

loukong33 Aug 18, 2025 •

edited

Loading