You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Blackwell] Support narrower TMEM messages and shapes (#5945)
Narrower message widths can alleviate register pressure avoiding spills in workloads
that require a large number of per-thread registers
* Refactor separates tmem atom derived message constants from workload derived
message constraints
* Narrowing occurs when a single message would require >=50% of available thread
registers (128) and the workload requires all available registers (256) to complete
* Adds tcgen05.st/ld..16x256b codegen support. With subsequent work this can
pair with downstream stmatrix ops for lower latency epilogues
0 commit comments