You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[mlir][AMDGPU] Use LDS-only MMRA fences for lds_barrier (#157919)
The previous lowering strategy for amdgpu.lds_barrier (which is an
operation whose semantics are) "s.barrier, and all LDS operations before
this happen-before LDS operations after this, and there must not be an
inherent fence/forcing-to-completion of global memory (for performance)"
was previosuly implemented through using manual calls to waitcnt()
intrinsics and the s_barrire intrinsic(s).
The lack of explicit fencing enabled miscompiles (where LDS accesses
were reordered with the barrier) on gfx12. Since LLVM now allows MMRA
annotations to ensure that only LDS accesses are fenced by a pair of
fences, we can now use these fences in order to explicitly represent the
semantics we want instead of trying to prescribe the method of their
implemntation.
Note that the gfx908 workaround of hiding the s_barrier in inline
assembly in order to prevent spurious vmem barriers remains in place,
but is is removed for gfx11 because the fences have been changed to give
us the effect we want recently.
0 commit comments