Add narrow repro for #9267: autodiff gradient depends on load pattern by rkoivunen-sw · Pull Request #10090 · shader-slang/slang

rkoivunen-sw · 2026-02-19T19:29:32Z

Narrow repro for #9267

Two Slang kernels compute the same loss. One uses loadVecOnce<3> (loop + mutable index), the other uses three manual loadOnce calls. Same max(), same inputs.

Result: Gradients differ. Max abs error 0.1667 (= 1/6, the invCount). The loadVecOnce path routes gradient to wrong tensor indices.

The bug is not the max() tie-breaking rule. Both kernels use the same max(). The issue is how the backward pass reverses the loadVecOnce loop index arithmetic. Manual loads produce correct gradients; the loop path doesn't.

Files

tests/bugs/9267-autodiff-max-gradient.slang -- two kernels (loadVecOnce vs manual)
tests/bugs/9267-autodiff-max-gradient-runner.py -- runs both, compares, exits 1 if they differ
tests/bugs/9267-autodiff-max-gradient-README.md -- details

Run

python3 tests/bugs/9267-autodiff-max-gradient-runner.py

Requires slangtorch + PyTorch + CUDA. Tested with Slang 2026.2.2, PyTorch 2.10.0, slangtorch 1.3.19, CUDA 13.0, RTX A3000.

Gradient output

grad(vec_max):
[[ 0.3333, -0.0000,  0.0000],
 [-0.1667,  0.3333, -0.0000],
 [ 0.1667,  0.0000, -0.1667]]

grad(vec_max_manual):
[[0.3333, -0.0000, -0.0000],
 [0.0000, 0.3333, -0.0000],
 [0.0000, -0.0000, 0.0000]]

Refs: #9267

CLAassistant · 2026-02-19T19:29:38Z

All committers have signed the CLA.

coderabbitai · 2026-02-19T19:29:52Z

No actionable comments were generated in the recent review. 🎉

📝 Walkthrough

Walkthrough

Adds a regression test and repro for GitHub issue #9267: two Slang CUDA kernels (vectorized vs manual loads), a Python runner that executes forward/backward and compares gradients against a PyTorch reference, and a README describing the repro and how to run it.

Changes

Cohort / File(s)	Summary
Documentation `tests/bugs/9267-autodiff-max-gradient-README.md`	New README describing the autodiff max() gradient repro, kernel load-pattern differences, run instructions, and environment requirements.
Test Runner `tests/bugs/9267-autodiff-max-gradient-runner.py`	New Python script that loads the SlangTorch module, runs both kernel variants, performs forward/backward to collect gradients, computes a PyTorch reference gradient, compares results, and exits non‑zero on mismatches.
Slang Kernel & Extensions `tests/bugs/9267-autodiff-max-gradient.slang`	New Slang file adding TensorView/DiffTensorView loadVec/loadVecOnce helpers and two CUDA kernels (`loss_kernel_vec_max`, `loss_kernel_vec_max_manual`) that differ only in load strategy to expose autodiff gradient differences.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as Test Runner (Python)
    participant Slang as SlangTorch Module
    participant CUDA as CUDA Kernels
    participant Autograd as Autograd/Backward
    participant PyRef as PyTorch Reference
    participant Comparator as Comparator/Reporter

    Runner->>Slang: load module & kernels
    Runner->>CUDA: invoke loss_kernel_vec_max (forward)
    Runner->>CUDA: invoke loss_kernel_vec_max_manual (forward)
    CUDA-->>Autograd: return outputs (losses)
    Runner->>Autograd: backward both losses -> compute grads
    Runner->>PyRef: compute reference gradient
    Autograd-->>Comparator: provide grads (vec_max, manual)
    PyRef-->>Comparator: provide reference grad
    Comparator->>Runner: compare arrays, print diffs, exit status

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I nibble tensors, hop through code and thread,

Two loads diverge where gradients tread,
Vector or single, they chase the same sun—
My whiskers twitch till the mismatch is done,
Celebrate the test; may the fix soon run! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding a narrow reproduction case for issue `#9267` demonstrating autodiff gradient issues dependent on load pattern.
Description check	✅ Passed	The description is directly related to the changeset, providing clear context about the bug being reproduced, the difference between the two kernel implementations, expected gradient outputs, and run instructions.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/bugs/9267-autodiff-max-gradient-runner.py`:
- Around line 78-85: In the backward function remove the unused local variable
assignment to satisfy Ruff F841: delete the line that assigns kernel (the
getattr(...) into variable named kernel) and keep the existing use of kernel_bwd
(which references module.loss_kernel_vec_max.bwd); alternatively replace the
assignment with an underscore (e.g., _ = getattr(...)) if you want to keep the
lookup for side effects—target the assignment in function backward where kernel
is created and remove or replace it.
- Around line 6-24: Update the usage string and the SLANG_FILE constant so the
runner points to the actual file and correct invocation path: change the printed
usage line that currently says "python bug_report/ci_tests/run_9267_reveal.py"
to the correct path used in this PR, and set SLANG_FILE to the actual .slang
filename present in the repo (replace repro_9267_reveal_slangtorch.slang with
the real filename); ensure SCRIPT_DIR and THREADS remain unchanged and that the
script exits with a clear error if the SLANG_FILE cannot be found.

tests/bugs/9267-autodiff-max-gradient-runner.py

coderabbitai · 2026-02-19T19:34:34Z

tests/bugs/9267-autodiff-max-gradient-runner.py

+    def backward(p, loss):
+        p_copy = p.detach().requires_grad_(True)
+        loss_copy = loss.detach()
+        grad_loss = torch.ones_like(loss_copy)
+        blocks = (n + THREADS - 1) // THREADS
+        kernel = getattr(module, p.grad_fn.__class__.__module__.split(".")[0] if hasattr(p, "grad_fn") else "loss_kernel_vec_max")
+        # Use the same kernel's bwd
+        kernel_bwd = getattr(module, "loss_kernel_vec_max").bwd


⚠️ Potential issue | 🟡 Minor

Remove unused local to satisfy Ruff F841.
kernel is assigned but never used.

🧹 Proposed fix

- kernel = getattr(module, p.grad_fn.__class__.__module__.split(".")[0] if hasattr(p, "grad_fn") else "loss_kernel_vec_max") # Use the same kernel's bwd kernel_bwd = getattr(module, "loss_kernel_vec_max").bwd

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def backward(p, loss):

p_copy = p.detach().requires_grad_(True)

loss_copy = loss.detach()

grad_loss = torch.ones_like(loss_copy)

blocks = (n + THREADS - 1) // THREADS

kernel = getattr(module, p.grad_fn.__class__.__module__.split(".")[0] if hasattr(p, "grad_fn") else "loss_kernel_vec_max")

# Use the same kernel's bwd

kernel_bwd = getattr(module, "loss_kernel_vec_max").bwd

def backward(p, loss):

p_copy = p.detach().requires_grad_(True)

loss_copy = loss.detach()

grad_loss = torch.ones_like(loss_copy)

blocks = (n + THREADS - 1) // THREADS

# Use the same kernel's bwd

kernel_bwd = getattr(module, "loss_kernel_vec_max").bwd

🧰 Tools

🪛 Ruff (0.15.1)

[error] 83-83: Local variable kernel is assigned to but never used

Remove assignment to unused variable kernel

(F841)

[warning] 85-85: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/bugs/9267-autodiff-max-gradient-runner.py` around lines 78 - 85, In the backward function remove the unused local variable assignment to satisfy Ruff F841: delete the line that assigns kernel (the getattr(...) into variable named kernel) and keep the existing use of kernel_bwd (which references module.loss_kernel_vec_max.bwd); alternatively replace the assignment with an underscore (e.g., _ = getattr(...)) if you want to keep the lookup for side effects—target the assignment in function backward where kernel is created and remove or replace it.

…load pattern Two Slang kernels compute the same loss. One uses loadVecOnce<3> (loop + mutable index), the other uses three manual loadOnce calls. Same max(), same inputs -- gradients differ (max error 0.1667). The bug is in the backward pass for the loadVecOnce loop, not in the max() tie-breaking rule. Slang-vs-Slang comparison, no PyTorch reference needed. Refs: shader-slang#9267

rkoivunen-sw requested a review from a team as a code owner February 19, 2026 19:29

rkoivunen-sw requested review from bmillsNV and removed request for a team February 19, 2026 19:29

rkoivunen-sw mentioned this pull request Feb 19, 2026

Incorrect CUDA autodiff gradients when using max #9267

Open

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

rkoivunen-sw force-pushed the repro/9267-autodiff-max-gradient branch from dce2092 to d4262b7 Compare February 19, 2026 19:39

rkoivunen-sw force-pushed the repro/9267-autodiff-max-gradient branch from d4262b7 to f8e14d9 Compare February 19, 2026 19:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add narrow repro for #9267: autodiff gradient depends on load pattern#10090

Add narrow repro for #9267: autodiff gradient depends on load pattern#10090
rkoivunen-sw wants to merge 1 commit intoshader-slang:masterfrom
rkoivunen-sw:repro/9267-autodiff-max-gradient

rkoivunen-sw commented Feb 19, 2026

Uh oh!

CLAassistant commented Feb 19, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 19, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

rkoivunen-sw commented Feb 19, 2026

Narrow repro for #9267

Files

Run

Gradient output

Uh oh!

CLAassistant commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Feb 19, 2026 •

edited

Loading

coderabbitai bot commented Feb 19, 2026 •

edited

Loading