-
Notifications
You must be signed in to change notification settings - Fork 403
replace arange and permute with the third output of npu_moe_gating_top_k_softmax #2418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an optimization by replacing the manual creation of row_idx
using torch.arange
and permute
with the third output from the npu_moe_gating_top_k_softmax
kernel. This is a good performance improvement. The changes are consistently applied across the codebase, including updates to function signatures and tests to accommodate the new row_idx
return value. I've identified one potential issue in the control flow logic that could cause a specialized execution path to be incorrectly overridden by a more general one. Please see the detailed comment.
@@ -188,12 +188,12 @@ def _select_experts_with_fusion_ops( | |||
eps=float(1e-20)) | |||
|
|||
if not use_grouped_topk and custom_routing_function is None and scoring_func == "softmax" and is_unquantized: | |||
topk_weights, topk_ids, _ = torch_npu.npu_moe_gating_top_k_softmax( | |||
topk_weights, topk_ids, row_idx = torch_npu.npu_moe_gating_top_k_softmax( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if
block can be executed even if the preceding if is_deepseek_v3_r1:
block was already executed. This would cause the results from the specialized npu_moe_gating_top_k
path to be overwritten by this more general npu_moe_gating_top_k_softmax
path, which is likely not the intended behavior. To ensure that only one of these specialized fusion paths is taken, consider changing the if
on line 190 to an elif
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
…p_k_softmax Signed-off-by: huangxialu <[email protected]>
@@ -400,7 +400,7 @@ def test_select_experts(self, mock_dist_env, mock_moe_env, | |||
|
|||
x = torch.randn(8, 2) | |||
router_logits = torch.randn(8, 2) | |||
topk_weights, topk_ids = select_experts( | |||
topk_weights, topk_ids, row_idx = select_experts( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add a test case to cover the fused_experts
function with the row_idx
parameter passed in.
1, 0).contiguous()) | ||
if row_idx is None: | ||
row_idx_len = num_tokens * top_k | ||
row_idx = (torch.arange(0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems all layers use the same row_idx
, can we just construct it once?
Same as #2373 ? |
Yes. I will close this pr. |
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?