Skip to content

Commit cdae5fc

Browse files
apaszkeGoogle-ML-Automation
authored andcommitted
[Mosaic GPU] Make sure to do the async proxy fence before wargroup sync
This is the ordering we want for a proper release of generic SMEM stores into the async proxy. The old order was problematic: once the warpgroup barrier was complete, some warps could get deselected before they get to the fence. For as long as the first warp would make progress, it could go through the fence along and start issuing TMA copies before other warps have synchronized with the async proxy. I have not observed this problem in any of our kernels so far, but this order seems safer to me. PiperOrigin-RevId: 733333814
1 parent 155839b commit cdae5fc

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

jax/experimental/mosaic/gpu/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -670,10 +670,10 @@ def parse_indices(
670670

671671

672672
def commit_shared():
673-
warpgroup_barrier()
674673
nvvm.fence_proxy(
675674
nvvm.ProxyKind.async_shared, space=nvvm.SharedSpace.shared_cta
676675
)
676+
warpgroup_barrier()
677677

678678

679679
def warpgroup_barrier():

0 commit comments

Comments
 (0)