Commit 341b78b
[TLX] Refactor grouped gemm with configurable sublicing (#651)
Summary:
Subslicing enables bigger tile size and more pipeline stages. It benefits certain shapes:
Triton autotuning for function grouped_matmul_tlx_kernel,
best config selected: BLOCK_SIZE_M: 128, BLOCK_SIZE_N: 256, BLOCK_SIZE_K: 64, NUM_SMEM_BUFFERS: 3, NUM_TMEM_BUFFERS: 2, EPILOGUE_SUBTILE: **4**, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Pull Request resolved: #651
Reviewed By: manman-ren
Differential Revision: D86577923
Pulled By: htyu
fbshipit-source-id: cda92b66d1a727dbc1279792a0c931deead84db91 parent c8a4965 commit 341b78b
1 file changed
+11
-16
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
344 | 344 | | |
345 | 345 | | |
346 | 346 | | |
347 | | - | |
| 347 | + | |
348 | 348 | | |
349 | 349 | | |
350 | 350 | | |
| |||
411 | 411 | | |
412 | 412 | | |
413 | 413 | | |
414 | | - | |
| 414 | + | |
415 | 415 | | |
416 | 416 | | |
417 | 417 | | |
| |||
430 | 430 | | |
431 | 431 | | |
432 | 432 | | |
433 | | - | |
434 | | - | |
435 | | - | |
436 | | - | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
437 | 441 | | |
438 | | - | |
439 | | - | |
440 | | - | |
441 | | - | |
442 | | - | |
443 | | - | |
444 | | - | |
445 | | - | |
446 | | - | |
447 | | - | |
| 442 | + | |
448 | 443 | | |
449 | 444 | | |
450 | 445 | | |
| |||
0 commit comments