Commit 7dc5492
authored
This PR allows to pipeline WGMMAs that take the lhs on registers. The
strategy is to wait on the WGMMA from the previous loop to have finished
before executing the next one to avoid overwritting the registers too
early.
Note that this does depend on ptxas handling the register allocation
correctly.
This PR also includes:
- A fix for the WGMMAPrefetch with `swizzlingByteWidth = 128`, which
produced wrong results
- A fix in the way we lower `memdesc_subview` (now it's simpler again)
- A fix in the way we lower mmav3 and mmav5 (which renders the complex
path in `memdesc_subview` unnecessary)
All these are tested end-to-end via the improved `test_cast_matmul.py`
In an 8k x 8k x 8k dense bf16 x mxfp4 matmul we get a speed up of: 2.441
-> 2.039
We might need to split the pointwise computations and interleave them
with the wgmmas similar to how CUTLASS does it, but we don't do that
in this PR.
This PR supersedes WGMMAPrefetch as it drops most of the preconditions
of that pass.
1 parent a3f5ea6 commit 7dc5492
File tree
15 files changed
+455
-1190
lines changed- include/triton/Dialect/TritonGPU/Transforms
- lib
- Conversion/TritonGPUToLLVM
- Dialect/TritonGPU/Transforms
- Pipeliner
- python
- src
- test/regression
- test
- Conversion
- TritonGPU
- third_party/nvidia
- backend
- lib/TritonNVIDIAGPUToLLVM/DotOpToLLVM
15 files changed
+455
-1190
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
219 | 219 | | |
220 | 220 | | |
221 | 221 | | |
222 | | - | |
223 | | - | |
224 | | - | |
225 | | - | |
226 | | - | |
227 | | - | |
228 | | - | |
229 | | - | |
230 | | - | |
231 | | - | |
232 | | - | |
233 | | - | |
234 | | - | |
235 | | - | |
236 | | - | |
237 | | - | |
238 | | - | |
239 | 222 | | |
240 | 223 | | |
241 | 224 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
57 | 61 | | |
58 | 62 | | |
59 | 63 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
441 | 441 | | |
442 | 442 | | |
443 | 443 | | |
444 | | - | |
445 | | - | |
446 | | - | |
447 | | - | |
448 | | - | |
449 | | - | |
450 | | - | |
451 | | - | |
452 | | - | |
453 | | - | |
454 | | - | |
455 | | - | |
456 | | - | |
457 | | - | |
458 | | - | |
459 | | - | |
460 | | - | |
461 | | - | |
462 | | - | |
463 | | - | |
464 | | - | |
465 | | - | |
466 | | - | |
467 | | - | |
468 | | - | |
469 | | - | |
470 | | - | |
471 | | - | |
472 | | - | |
473 | | - | |
474 | | - | |
475 | | - | |
476 | | - | |
477 | | - | |
478 | | - | |
479 | | - | |
480 | | - | |
481 | | - | |
482 | | - | |
| 444 | + | |
483 | 445 | | |
484 | 446 | | |
485 | 447 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | | - | |
29 | 28 | | |
30 | 29 | | |
31 | 30 | | |
| |||
0 commit comments