You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[AMD] Add a block ping-poing scheduling pass (#5018)
This change introduces a new pass, `tritonamdgpu-block-pingpong`.
Main target is the GEMM kernel and the ideal case it tries to generate
is that having two warp run in parallel on one SIMD, alternately execute
a section of `mfma` instruction and a section of `memory` instructions
so that GPU can make `mfma` busy while hiding the latency of `memory`
instructions.
Right now behind an env var `TRITON_HIP_USE_BLOCK_PINGPONG=1`
---------
Co-authored-by: Lei Zhang <[email protected]>
0 commit comments