Skip to content

Conversation

loukong33
Copy link
Contributor

@loukong33 loukong33 commented Aug 18, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an optimization by replacing the manual creation of row_idx using torch.arange and permute with the third output from the npu_moe_gating_top_k_softmax kernel. This is a good performance improvement. The changes are consistently applied across the codebase, including updates to function signatures and tests to accommodate the new row_idx return value. I've identified one potential issue in the control flow logic that could cause a specialized execution path to be incorrectly overridden by a more general one. Please see the detailed comment.

@@ -188,12 +188,12 @@ def _select_experts_with_fusion_ops(
eps=float(1e-20))

if not use_grouped_topk and custom_routing_function is None and scoring_func == "softmax" and is_unquantized:
topk_weights, topk_ids, _ = torch_npu.npu_moe_gating_top_k_softmax(
topk_weights, topk_ids, row_idx = torch_npu.npu_moe_gating_top_k_softmax(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This if block can be executed even if the preceding if is_deepseek_v3_r1: block was already executed. This would cause the results from the specialized npu_moe_gating_top_k path to be overwritten by this more general npu_moe_gating_top_k_softmax path, which is likely not the intended behavior. To ensure that only one of these specialized fusion paths is taken, consider changing the if on line 190 to an elif.

Copy link
Contributor Author

@loukong33 loukong33 Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@@ -400,7 +400,7 @@ def test_select_experts(self, mock_dist_env, mock_moe_env,

x = torch.randn(8, 2)
router_logits = torch.randn(8, 2)
topk_weights, topk_ids = select_experts(
topk_weights, topk_ids, row_idx = select_experts(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add a test case to cover the fused_experts function with the row_idx parameter passed in.

1, 0).contiguous())
if row_idx is None:
row_idx_len = num_tokens * top_k
row_idx = (torch.arange(0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems all layers use the same row_idx, can we just construct it once?

@ApsarasX
Copy link
Collaborator

Same as #2373 ?

@loukong33
Copy link
Contributor Author

Same as #2373 ?

Yes. I will close this pr.

@loukong33 loukong33 closed this Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants