topk backward vectorized #5296

dev-tomek · 2025-10-13T12:42:23Z

No description provided.

Copilot

Pull Request Overview

This PR optimizes the topk backward pass implementation by vectorizing gradient accumulation operations. The changes replace the previous approach of clearing and then selectively writing gradients with a more efficient vectorized method.

Replaces manual gradient clearing and selective writes with vectorized operations
Uses broadcasting and masking to accumulate gradients efficiently
Maintains the same computational logic while improving performance

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-15T07:57:05Z

python/triton_kernels/triton_kernels/topk_details/_topk_backward.py

+    offs_xn_expanded = offs_xn[:, None]
+    y_indx_expanded = y_indx[None, :]
+    match_mask = (offs_xn_expanded == y_indx_expanded)
+    dx_topk_expanded = dx_topk[None, :]
+    dx_full = tl.sum(tl.where(match_mask, dx_topk_expanded, 0.0), axis=1)


The vectorized implementation creates large intermediate tensors through broadcasting. For large N_EXPTS_PAD and N_EXPTS_ACT, this creates an N_EXPTS_PAD × N_EXPTS_ACT matrix which may consume significant memory and be slower than the original scatter approach for sparse updates.

Suggested change

offs_xn_expanded = offs_xn[:, None]

y_indx_expanded = y_indx[None, :]

match_mask = (offs_xn_expanded == y_indx_expanded)

dx_topk_expanded = dx_topk[None, :]

dx_full = tl.sum(tl.where(match_mask, dx_topk_expanded, 0.0), axis=1)

# Scatter dx_topk into dx_full at positions y_indx

dx_full = tl.zeros([N_EXPTS_PAD], dtype=tl.float32)

for i in range(N_EXPTS_ACT):

idx = y_indx[i]

if idx < N_EXPTS_PAD:

dx_full = tl.store(dx_full, idx, dx_topk[i])

topk backward vectorized

f7f4e80

dev-tomek mentioned this pull request Oct 13, 2025

Some test_routing.py::test_op tests cases fail on BMG #5117

Open

dev-tomek linked an issue Oct 13, 2025 that may be closed by this pull request

Some test_routing.py::test_op tests cases fail on BMG #5117

Open

HBN-MichalSzy requested review from anmyachev, Copilot and whitneywhtsang October 15, 2025 07:56

Copilot AI reviewed Oct 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

topk backward vectorized #5296

topk backward vectorized #5296

Uh oh!

dev-tomek commented Oct 13, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-    offs_xn_expanded = offs_xn[:, None]
-    y_indx_expanded = y_indx[None, :]
-    match_mask = (offs_xn_expanded == y_indx_expanded)
-    dx_topk_expanded = dx_topk[None, :]
-    dx_full = tl.sum(tl.where(match_mask, dx_topk_expanded, 0.0), axis=1)
+    # Scatter dx_topk into dx_full at positions y_indx
+    dx_full = tl.zeros([N_EXPTS_PAD], dtype=tl.float32)
+    for i in range(N_EXPTS_ACT):
+        idx = y_indx[i]
+        if idx < N_EXPTS_PAD:
+            dx_full = tl.store(dx_full, idx, dx_topk[i])

Uh oh!

topk backward vectorized #5296

Are you sure you want to change the base?

topk backward vectorized #5296

Uh oh!

Conversation

dev-tomek commented Oct 13, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant