Commit 12aa288
authored
[BACKEND] Extended combiner regarding dot scaled ops (#9616)
When using tl.dot_scaled, changing the code from an explicit accumulator
to Python's `+=` causes a big change in how many registers are used. In
our tests, the += version uses many more registers. This leads to lower
occupancy, more pressure on memory bandwidth, and register spills.
## Version A (explicit acc=acc) — uses fewer registers
```python
acc = tl.dot_scaled(
a, a_scale, A_FMT,
b, b_scale, B_FMT,
acc=acc,
out_dtype=tl.float32,
)
```
The generated `.amdgcn` code shows:
```asm
.vgpr_count: 186
.vgpr_spill_count: 0
```
- Much better performance
### Version B (+=) — uses more registers
```python
acc += tl.dot_scaled(
a, a_scale, A_FMT,
b, b_scale, B_FMT,
out_dtype=tl.float32,
)
```
The generated .amdgcn code shows:
```asm
.vgpr_count: 256
.vgpr_spill_count: 45
```
- Much worse performance
### Expected behavior
Both versions do the same thing logically, so they should produce
similar compiled code and use about the same number of registers.
### Comparison with tl.dot
This problem does not happen with tl.dot. In that case, the compiler
correctly detects the accumulation pattern and merges it, which avoids
extra temporary values and keeps register usage low.1 parent 756afc0 commit 12aa288
File tree
2 files changed
+38
-9
lines changed- lib/Dialect/Triton/Transforms
- test/Triton
2 files changed
+38
-9
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
231 | 231 | | |
232 | 232 | | |
233 | 233 | | |
234 | | - | |
235 | | - | |
| 234 | + | |
| 235 | + | |
236 | 236 | | |
237 | | - | |
| 237 | + | |
238 | 238 | | |
239 | 239 | | |
240 | | - | |
241 | | - | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
242 | 243 | | |
243 | 244 | | |
244 | | - | |
| 245 | + | |
245 | 246 | | |
246 | 247 | | |
247 | 248 | | |
| |||
252 | 253 | | |
253 | 254 | | |
254 | 255 | | |
255 | | - | |
| 256 | + | |
| 257 | + | |
256 | 258 | | |
257 | 259 | | |
258 | 260 | | |
| |||
270 | 272 | | |
271 | 273 | | |
272 | 274 | | |
273 | | - | |
274 | | - | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
275 | 279 | | |
276 | 280 | | |
277 | 281 | | |
| |||
284 | 288 | | |
285 | 289 | | |
286 | 290 | | |
| 291 | + | |
287 | 292 | | |
288 | 293 | | |
289 | 294 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
46 | 70 | | |
47 | 71 | | |
48 | 72 | | |
| |||
0 commit comments