Skip to content

Commit 0d975d4

Browse files
authored
Fix correctness of streamk by adding a required barrier (#745)
1 parent 7adc012 commit 0d975d4

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

python/perf-kernels/streamk/streamk_kernel.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,7 @@ def streamk_gemm(
162162
rn1 = tl.max_contiguous(tl.multiple_of(rn1, BLOCK_SIZE_N), BLOCK_SIZE_N)
163163
P_ = P + pid * BLOCK_SIZE_M * BLOCK_SIZE_N + rm1[:, None] * BLOCK_SIZE_N + rn1[None, :]
164164
tl.store(P_, acc, cache_modifier=".wt")
165+
tl.debug_barrier()
165166
tl.store(locks + pid, 1, cache_modifier=".wt")
166167
# tl.store(P_, acc)
167168
# tl.debug_barrier()

0 commit comments

Comments
 (0)