Commit f408175
authored
avoid pointer mutation in add_rms_norm kernel (#1008)
## Summary
<!--- This is a required section; please describe the main purpose of
this proposed code change. --->
Rewrite fused_add_rms_norm kernel to use explicit channel offsets
instead of mutating X/Y base pointers inside loops.
This improves Triton compiler optimization opportunities, enables more
predictable memory access patterns, and avoids loop-carried pointer
dependencies.
## Testing Done
<!--- This is a required section; please describe how this change was
tested. --->
<img width="1800" height="382" alt="image"
src="https://github.com/user-attachments/assets/b361d41e-1379-4835-8acb-b0234d5af22b"
/>
- Hardware Type: Ascend NPU 910B4
- [x] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence1 parent 4dd540e commit f408175
1 file changed
+13
-20
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
162 | 162 | | |
163 | 163 | | |
164 | 164 | | |
165 | | - | |
166 | | - | |
167 | | - | |
168 | | - | |
169 | | - | |
170 | | - | |
171 | | - | |
172 | | - | |
173 | 165 | | |
174 | 166 | | |
175 | 167 | | |
176 | | - | |
177 | | - | |
178 | | - | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
179 | 177 | | |
180 | 178 | | |
181 | | - | |
| 179 | + | |
182 | 180 | | |
183 | 181 | | |
184 | 182 | | |
| |||
195 | 193 | | |
196 | 194 | | |
197 | 195 | | |
198 | | - | |
| 196 | + | |
| 197 | + | |
199 | 198 | | |
200 | 199 | | |
201 | 200 | | |
202 | | - | |
203 | 201 | | |
204 | 202 | | |
205 | 203 | | |
| |||
210 | 208 | | |
211 | 209 | | |
212 | 210 | | |
213 | | - | |
214 | | - | |
215 | | - | |
216 | | - | |
217 | | - | |
218 | | - | |
| 211 | + | |
219 | 212 | | |
220 | 213 | | |
221 | 214 | | |
| |||
0 commit comments