Commit b27b4c5
Fix illegal memory access in Triton RMSNorm and RoPE (#804)
## Summary
<!--- This is a required section; please describe the main purpose of
this proposed code change. --->
When using very large tensors (e.g. seq_len=1e6, hidden_size=4096),
Triton’s default 32-bit `tl.program_id(0)` can overflow, leading to
out-of-bounds memory accesses. This change casts the program ID to
64-bit (`tl.int64`) to ensure all pointer arithmetic stays within the
valid address range. Fix #803
<!---
## Details
This is an optional section; is there anything specific that reviewers
should be aware of?
--->
## Testing Done
<!--- This is a required section; please describe how this change was
tested. --->
<!--
Replace BLANK with your device type. For example, A100-80G-PCIe
Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them.
-->
- Hardware Type: <BLANK>
- [ ] run `make test` to ensure correctness
- [ ] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence
---------
Co-authored-by: Shao Tang <[email protected]>1 parent 9cf2019 commit b27b4c5
2 files changed
+3
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
66 | | - | |
| 66 | + | |
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
| |||
137 | 137 | | |
138 | 138 | | |
139 | 139 | | |
140 | | - | |
| 140 | + | |
141 | 141 | | |
142 | 142 | | |
143 | 143 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
| 35 | + | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| |||
0 commit comments