You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Gluon][TritonNvidiaGPU] Add and expose tcgen05.commit (#7335)
This adds a separate op that maps to `tcgen05.commit`. When writing
persistent kernels, it is far simpler and more performant to be able to
enqueue arrival on an mbarrier in the persistent loop epilogue using a
separate commit op than try to figure out which specific MMA needs to
arrive on the barrier. E.g.
```python
for tile_id in range(...):
mma # peeled MMA
for _ in range(num_iters):
mma
mma
for _ in range(num_masked_iters):
mma
mma
commit
```
This will also be important when doing warp specialization on nested
control flow (cc @masahi@mbrookhart )
This PR also slightly optimizes the codegen for selecting warp 0 when
there is only 1 warp by calling `getWarpId`. This removes a few
instructions in the MMA loop for a warp specialized kernel.
0 commit comments