Add narrow repro for #9267: autodiff gradient depends on load pattern#10090
Add narrow repro for #9267: autodiff gradient depends on load pattern#10090rkoivunen-sw wants to merge 1 commit intoshader-slang:masterfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 📝 WalkthroughWalkthroughAdds a regression test and repro for GitHub issue Changes
Sequence Diagram(s)sequenceDiagram
participant Runner as Test Runner (Python)
participant Slang as SlangTorch Module
participant CUDA as CUDA Kernels
participant Autograd as Autograd/Backward
participant PyRef as PyTorch Reference
participant Comparator as Comparator/Reporter
Runner->>Slang: load module & kernels
Runner->>CUDA: invoke loss_kernel_vec_max (forward)
Runner->>CUDA: invoke loss_kernel_vec_max_manual (forward)
CUDA-->>Autograd: return outputs (losses)
Runner->>Autograd: backward both losses -> compute grads
Runner->>PyRef: compute reference gradient
Autograd-->>Comparator: provide grads (vec_max, manual)
PyRef-->>Comparator: provide reference grad
Comparator->>Runner: compare arrays, print diffs, exit status
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/bugs/9267-autodiff-max-gradient-runner.py`:
- Around line 78-85: In the backward function remove the unused local variable
assignment to satisfy Ruff F841: delete the line that assigns kernel (the
getattr(...) into variable named kernel) and keep the existing use of kernel_bwd
(which references module.loss_kernel_vec_max.bwd); alternatively replace the
assignment with an underscore (e.g., _ = getattr(...)) if you want to keep the
lookup for side effects—target the assignment in function backward where kernel
is created and remove or replace it.
- Around line 6-24: Update the usage string and the SLANG_FILE constant so the
runner points to the actual file and correct invocation path: change the printed
usage line that currently says "python bug_report/ci_tests/run_9267_reveal.py"
to the correct path used in this PR, and set SLANG_FILE to the actual .slang
filename present in the repo (replace repro_9267_reveal_slangtorch.slang with
the real filename); ensure SCRIPT_DIR and THREADS remain unchanged and that the
script exits with a clear error if the SLANG_FILE cannot be found.
| def backward(p, loss): | ||
| p_copy = p.detach().requires_grad_(True) | ||
| loss_copy = loss.detach() | ||
| grad_loss = torch.ones_like(loss_copy) | ||
| blocks = (n + THREADS - 1) // THREADS | ||
| kernel = getattr(module, p.grad_fn.__class__.__module__.split(".")[0] if hasattr(p, "grad_fn") else "loss_kernel_vec_max") | ||
| # Use the same kernel's bwd | ||
| kernel_bwd = getattr(module, "loss_kernel_vec_max").bwd |
There was a problem hiding this comment.
Remove unused local to satisfy Ruff F841.
kernel is assigned but never used.
🧹 Proposed fix
- kernel = getattr(module, p.grad_fn.__class__.__module__.split(".")[0] if hasattr(p, "grad_fn") else "loss_kernel_vec_max")
# Use the same kernel's bwd
kernel_bwd = getattr(module, "loss_kernel_vec_max").bwd📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def backward(p, loss): | |
| p_copy = p.detach().requires_grad_(True) | |
| loss_copy = loss.detach() | |
| grad_loss = torch.ones_like(loss_copy) | |
| blocks = (n + THREADS - 1) // THREADS | |
| kernel = getattr(module, p.grad_fn.__class__.__module__.split(".")[0] if hasattr(p, "grad_fn") else "loss_kernel_vec_max") | |
| # Use the same kernel's bwd | |
| kernel_bwd = getattr(module, "loss_kernel_vec_max").bwd | |
| def backward(p, loss): | |
| p_copy = p.detach().requires_grad_(True) | |
| loss_copy = loss.detach() | |
| grad_loss = torch.ones_like(loss_copy) | |
| blocks = (n + THREADS - 1) // THREADS | |
| # Use the same kernel's bwd | |
| kernel_bwd = getattr(module, "loss_kernel_vec_max").bwd |
🧰 Tools
🪛 Ruff (0.15.1)
[error] 83-83: Local variable kernel is assigned to but never used
Remove assignment to unused variable kernel
(F841)
[warning] 85-85: Do not call getattr with a constant attribute value. It is not any safer than normal property access.
Replace getattr with attribute access
(B009)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/bugs/9267-autodiff-max-gradient-runner.py` around lines 78 - 85, In the
backward function remove the unused local variable assignment to satisfy Ruff
F841: delete the line that assigns kernel (the getattr(...) into variable named
kernel) and keep the existing use of kernel_bwd (which references
module.loss_kernel_vec_max.bwd); alternatively replace the assignment with an
underscore (e.g., _ = getattr(...)) if you want to keep the lookup for side
effects—target the assignment in function backward where kernel is created and
remove or replace it.
dce2092 to
d4262b7
Compare
…load pattern Two Slang kernels compute the same loss. One uses loadVecOnce<3> (loop + mutable index), the other uses three manual loadOnce calls. Same max(), same inputs -- gradients differ (max error 0.1667). The bug is in the backward pass for the loadVecOnce loop, not in the max() tie-breaking rule. Slang-vs-Slang comparison, no PyTorch reference needed. Refs: shader-slang#9267
d4262b7 to
f8e14d9
Compare
Narrow repro for #9267
Two Slang kernels compute the same loss. One uses
loadVecOnce<3>(loop + mutable index), the other uses three manualloadOncecalls. Samemax(), same inputs.Result: Gradients differ. Max abs error 0.1667 (= 1/6, the
invCount). The loadVecOnce path routes gradient to wrong tensor indices.The bug is not the
max()tie-breaking rule. Both kernels use the samemax(). The issue is how the backward pass reverses theloadVecOnceloop index arithmetic. Manual loads produce correct gradients; the loop path doesn't.Files
tests/bugs/9267-autodiff-max-gradient.slang-- two kernels (loadVecOnce vs manual)tests/bugs/9267-autodiff-max-gradient-runner.py-- runs both, compares, exits 1 if they differtests/bugs/9267-autodiff-max-gradient-README.md-- detailsRun
Requires slangtorch + PyTorch + CUDA. Tested with Slang 2026.2.2, PyTorch 2.10.0, slangtorch 1.3.19, CUDA 13.0, RTX A3000.
Gradient output
Refs: #9267