Commit aeac283
authored
[TritonGPU] Enable accum-init optimization for unconditionally zero-ed accumulators (#6395)
Currently, the pass doesn't fire when [there is no explicit op that
conditionally clears the
accumulator](https://github.com/triton-lang/triton/blob/main/lib/Dialect/TritonGPU/Transforms/OptimizeAccumulatorInit.cpp#L207-L211).
Thus, it misses the simplest case where this optimization is applicable
- the accumulator is initialized to zero, and after the first iteration,
the accumulator is always updated with +=.
The motivation is an IR like below. We want to hoist tmem_alloc outside
of the tile loop, but that requires explicitly clearing the accumulator
after the K loop for one tile completes. Enabling this optimization for
this case allows us to skip the explicit clearing.
```
for tile ...
for k ... iter_args(arg9 = cst_zero)
acc = tmem_alloc arg9
mma A B acc
next_acc = tmem_load acc
...
yield next_acc
```
---------
Co-authored-by: Masahiro Masuda <[email protected]>1 parent 4aeaae5 commit aeac283
File tree
5 files changed
+136
-78
lines changed- lib/Dialect/TritonGPU/Transforms
- WarpSpecialization
- python/test/unit/language
- test/TritonGPU
5 files changed
+136
-78
lines changedLines changed: 91 additions & 46 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
| |||
113 | 114 | | |
114 | 115 | | |
115 | 116 | | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
116 | 130 | | |
117 | 131 | | |
118 | 132 | | |
| |||
157 | 171 | | |
158 | 172 | | |
159 | 173 | | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
160 | 186 | | |
161 | 187 | | |
162 | 188 | | |
| |||
206 | 232 | | |
207 | 233 | | |
208 | 234 | | |
209 | | - | |
| 235 | + | |
| 236 | + | |
210 | 237 | | |
211 | 238 | | |
212 | 239 | | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
213 | 251 | | |
214 | 252 | | |
215 | 253 | | |
216 | 254 | | |
217 | 255 | | |
218 | | - | |
219 | | - | |
220 | | - | |
221 | | - | |
222 | | - | |
223 | | - | |
224 | | - | |
225 | | - | |
226 | | - | |
227 | | - | |
228 | | - | |
229 | | - | |
230 | | - | |
231 | | - | |
232 | | - | |
233 | | - | |
234 | | - | |
235 | | - | |
236 | | - | |
237 | | - | |
238 | | - | |
239 | | - | |
240 | | - | |
241 | | - | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
242 | 281 | | |
243 | | - | |
244 | | - | |
245 | | - | |
246 | | - | |
247 | | - | |
248 | | - | |
249 | | - | |
250 | | - | |
251 | | - | |
252 | | - | |
253 | | - | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
254 | 294 | | |
255 | | - | |
256 | | - | |
257 | | - | |
258 | | - | |
259 | | - | |
260 | | - | |
261 | | - | |
262 | | - | |
263 | | - | |
264 | | - | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
265 | 310 | | |
266 | 311 | | |
267 | 312 | | |
| |||
Lines changed: 10 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
28 | 34 | | |
29 | 35 | | |
30 | 36 | | |
31 | | - | |
32 | | - | |
33 | | - | |
| 37 | + | |
34 | 38 | | |
35 | 39 | | |
36 | 40 | | |
| |||
43 | 47 | | |
44 | 48 | | |
45 | 49 | | |
46 | | - | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
47 | 53 | | |
48 | 54 | | |
49 | 55 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
469 | 469 | | |
470 | 470 | | |
471 | 471 | | |
472 | | - | |
| 472 | + | |
473 | 473 | | |
474 | 474 | | |
475 | 475 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| |||
292 | 292 | | |
293 | 293 | | |
294 | 294 | | |
295 | | - | |
296 | | - | |
297 | | - | |
298 | | - | |
299 | | - | |
300 | | - | |
301 | | - | |
302 | | - | |
303 | | - | |
304 | | - | |
305 | | - | |
306 | | - | |
307 | | - | |
308 | | - | |
309 | | - | |
310 | | - | |
311 | | - | |
312 | | - | |
313 | | - | |
314 | | - | |
| 295 | + | |
315 | 296 | | |
316 | 297 | | |
| 298 | + | |
317 | 299 | | |
318 | 300 | | |
319 | 301 | | |
| 302 | + | |
320 | 303 | | |
321 | 304 | | |
| 305 | + | |
322 | 306 | | |
323 | 307 | | |
| 308 | + | |
324 | 309 | | |
325 | 310 | | |
326 | 311 | | |
327 | 312 | | |
328 | 313 | | |
329 | 314 | | |
330 | | - | |
| 315 | + | |
331 | 316 | | |
332 | 317 | | |
333 | 318 | | |
| |||
343 | 328 | | |
344 | 329 | | |
345 | 330 | | |
| 331 | + | |
346 | 332 | | |
347 | 333 | | |
348 | 334 | | |
349 | 335 | | |
350 | 336 | | |
351 | 337 | | |
352 | | - | |
| 338 | + | |
353 | 339 | | |
354 | 340 | | |
355 | 341 | | |
| |||
359 | 345 | | |
360 | 346 | | |
361 | 347 | | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
362 | 368 | | |
363 | 369 | | |
364 | 370 | | |
| |||
0 commit comments