Commit bb78fae
authored
[Blackwell] Support narrower TMEM messages and shapes (triton-lang#5945)
Narrower message widths can alleviate register pressure avoiding spills in workloads
that require a large number of per-thread registers
* Refactor separates tmem atom derived message constants from workload derived
message constraints
* Narrowing occurs when a single message would require >=50% of available thread
registers (128) and the workload requires all available registers (256) to complete
* Adds tcgen05.st/ld..16x256b codegen support. With subsequent work this can
pair with downstream stmatrix ops for lower latency epilogues1 parent bca378d commit bb78fae
File tree
2 files changed
+284
-125
lines changed- test/Conversion
- third_party/nvidia/lib/TritonNVIDIAGPUToLLVM
2 files changed
+284
-125
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
341 | 341 | | |
342 | 342 | | |
343 | 343 | | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
0 commit comments