Skip to content

Comments

Add narrow repro for #9267: autodiff gradient depends on load pattern#10090

Open
rkoivunen-sw wants to merge 1 commit intoshader-slang:masterfrom
rkoivunen-sw:repro/9267-autodiff-max-gradient
Open

Add narrow repro for #9267: autodiff gradient depends on load pattern#10090
rkoivunen-sw wants to merge 1 commit intoshader-slang:masterfrom
rkoivunen-sw:repro/9267-autodiff-max-gradient

Conversation

@rkoivunen-sw
Copy link

Narrow repro for #9267

Two Slang kernels compute the same loss. One uses loadVecOnce<3> (loop + mutable index), the other uses three manual loadOnce calls. Same max(), same inputs.

Result: Gradients differ. Max abs error 0.1667 (= 1/6, the invCount). The loadVecOnce path routes gradient to wrong tensor indices.

The bug is not the max() tie-breaking rule. Both kernels use the same max(). The issue is how the backward pass reverses the loadVecOnce loop index arithmetic. Manual loads produce correct gradients; the loop path doesn't.

Files

  • tests/bugs/9267-autodiff-max-gradient.slang -- two kernels (loadVecOnce vs manual)
  • tests/bugs/9267-autodiff-max-gradient-runner.py -- runs both, compares, exits 1 if they differ
  • tests/bugs/9267-autodiff-max-gradient-README.md -- details

Run

python3 tests/bugs/9267-autodiff-max-gradient-runner.py

Requires slangtorch + PyTorch + CUDA. Tested with Slang 2026.2.2, PyTorch 2.10.0, slangtorch 1.3.19, CUDA 13.0, RTX A3000.

Gradient output

grad(vec_max):
[[ 0.3333, -0.0000,  0.0000],
 [-0.1667,  0.3333, -0.0000],
 [ 0.1667,  0.0000, -0.1667]]

grad(vec_max_manual):
[[0.3333, -0.0000, -0.0000],
 [0.0000, 0.3333, -0.0000],
 [0.0000, -0.0000, 0.0000]]

Refs: #9267

@rkoivunen-sw rkoivunen-sw requested a review from a team as a code owner February 19, 2026 19:29
@rkoivunen-sw rkoivunen-sw requested review from bmillsNV and removed request for a team February 19, 2026 19:29
@CLAassistant
Copy link

CLAassistant commented Feb 19, 2026

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link

coderabbitai bot commented Feb 19, 2026

No actionable comments were generated in the recent review. 🎉


📝 Walkthrough

Walkthrough

Adds a regression test and repro for GitHub issue #9267: two Slang CUDA kernels (vectorized vs manual loads), a Python runner that executes forward/backward and compares gradients against a PyTorch reference, and a README describing the repro and how to run it.

Changes

Cohort / File(s) Summary
Documentation
tests/bugs/9267-autodiff-max-gradient-README.md
New README describing the autodiff max() gradient repro, kernel load-pattern differences, run instructions, and environment requirements.
Test Runner
tests/bugs/9267-autodiff-max-gradient-runner.py
New Python script that loads the SlangTorch module, runs both kernel variants, performs forward/backward to collect gradients, computes a PyTorch reference gradient, compares results, and exits non‑zero on mismatches.
Slang Kernel & Extensions
tests/bugs/9267-autodiff-max-gradient.slang
New Slang file adding TensorView/DiffTensorView loadVec/loadVecOnce helpers and two CUDA kernels (loss_kernel_vec_max, loss_kernel_vec_max_manual) that differ only in load strategy to expose autodiff gradient differences.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as Test Runner (Python)
    participant Slang as SlangTorch Module
    participant CUDA as CUDA Kernels
    participant Autograd as Autograd/Backward
    participant PyRef as PyTorch Reference
    participant Comparator as Comparator/Reporter

    Runner->>Slang: load module & kernels
    Runner->>CUDA: invoke loss_kernel_vec_max (forward)
    Runner->>CUDA: invoke loss_kernel_vec_max_manual (forward)
    CUDA-->>Autograd: return outputs (losses)
    Runner->>Autograd: backward both losses -> compute grads
    Runner->>PyRef: compute reference gradient
    Autograd-->>Comparator: provide grads (vec_max, manual)
    PyRef-->>Comparator: provide reference grad
    Comparator->>Runner: compare arrays, print diffs, exit status
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I nibble tensors, hop through code and thread,

Two loads diverge where gradients tread,
Vector or single, they chase the same sun—
My whiskers twitch till the mismatch is done,
Celebrate the test; may the fix soon run! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding a narrow reproduction case for issue #9267 demonstrating autodiff gradient issues dependent on load pattern.
Description check ✅ Passed The description is directly related to the changeset, providing clear context about the bug being reproduced, the difference between the two kernel implementations, expected gradient outputs, and run instructions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/bugs/9267-autodiff-max-gradient-runner.py`:
- Around line 78-85: In the backward function remove the unused local variable
assignment to satisfy Ruff F841: delete the line that assigns kernel (the
getattr(...) into variable named kernel) and keep the existing use of kernel_bwd
(which references module.loss_kernel_vec_max.bwd); alternatively replace the
assignment with an underscore (e.g., _ = getattr(...)) if you want to keep the
lookup for side effects—target the assignment in function backward where kernel
is created and remove or replace it.
- Around line 6-24: Update the usage string and the SLANG_FILE constant so the
runner points to the actual file and correct invocation path: change the printed
usage line that currently says "python bug_report/ci_tests/run_9267_reveal.py"
to the correct path used in this PR, and set SLANG_FILE to the actual .slang
filename present in the repo (replace repro_9267_reveal_slangtorch.slang with
the real filename); ensure SCRIPT_DIR and THREADS remain unchanged and that the
script exits with a clear error if the SLANG_FILE cannot be found.

Comment on lines 78 to 85
def backward(p, loss):
p_copy = p.detach().requires_grad_(True)
loss_copy = loss.detach()
grad_loss = torch.ones_like(loss_copy)
blocks = (n + THREADS - 1) // THREADS
kernel = getattr(module, p.grad_fn.__class__.__module__.split(".")[0] if hasattr(p, "grad_fn") else "loss_kernel_vec_max")
# Use the same kernel's bwd
kernel_bwd = getattr(module, "loss_kernel_vec_max").bwd
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused local to satisfy Ruff F841.
kernel is assigned but never used.

🧹 Proposed fix
-        kernel = getattr(module, p.grad_fn.__class__.__module__.split(".")[0] if hasattr(p, "grad_fn") else "loss_kernel_vec_max")
         # Use the same kernel's bwd
         kernel_bwd = getattr(module, "loss_kernel_vec_max").bwd
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def backward(p, loss):
p_copy = p.detach().requires_grad_(True)
loss_copy = loss.detach()
grad_loss = torch.ones_like(loss_copy)
blocks = (n + THREADS - 1) // THREADS
kernel = getattr(module, p.grad_fn.__class__.__module__.split(".")[0] if hasattr(p, "grad_fn") else "loss_kernel_vec_max")
# Use the same kernel's bwd
kernel_bwd = getattr(module, "loss_kernel_vec_max").bwd
def backward(p, loss):
p_copy = p.detach().requires_grad_(True)
loss_copy = loss.detach()
grad_loss = torch.ones_like(loss_copy)
blocks = (n + THREADS - 1) // THREADS
# Use the same kernel's bwd
kernel_bwd = getattr(module, "loss_kernel_vec_max").bwd
🧰 Tools
🪛 Ruff (0.15.1)

[error] 83-83: Local variable kernel is assigned to but never used

Remove assignment to unused variable kernel

(F841)


[warning] 85-85: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/bugs/9267-autodiff-max-gradient-runner.py` around lines 78 - 85, In the
backward function remove the unused local variable assignment to satisfy Ruff
F841: delete the line that assigns kernel (the getattr(...) into variable named
kernel) and keep the existing use of kernel_bwd (which references
module.loss_kernel_vec_max.bwd); alternatively replace the assignment with an
underscore (e.g., _ = getattr(...)) if you want to keep the lookup for side
effects—target the assignment in function backward where kernel is created and
remove or replace it.

@rkoivunen-sw rkoivunen-sw force-pushed the repro/9267-autodiff-max-gradient branch from dce2092 to d4262b7 Compare February 19, 2026 19:39
…load pattern

Two Slang kernels compute the same loss. One uses loadVecOnce<3>
(loop + mutable index), the other uses three manual loadOnce calls.
Same max(), same inputs -- gradients differ (max error 0.1667).

The bug is in the backward pass for the loadVecOnce loop, not in
the max() tie-breaking rule. Slang-vs-Slang comparison, no PyTorch
reference needed.

Refs: shader-slang#9267
@rkoivunen-sw rkoivunen-sw force-pushed the repro/9267-autodiff-max-gradient branch from d4262b7 to f8e14d9 Compare February 19, 2026 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants