Skip to content

Commit fda48a7

Browse files
author
Timmy
committed
replacing barrier with memfence in the inner most loop requires an extra barrier at the beginning of the outer loop.
1 parent a55d3ae commit fda48a7

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

src/library/blas/gens/clTemplates/sgemm_gcn_SmallMatrices.cl

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -284,7 +284,8 @@ __kernel void sgemm_NT_32_32_16_16x16_2x2__ALPHABETA_BRANCH( __global float cons
284284
{
285285
__local float* plA = lA + idy*33+idx;
286286
__local float* plB = lB + idy*33+idx;
287-
287+
barrier(CLK_LOCAL_MEM_FENCE);
288+
288289
plB[0] = CurrentOffSetB>=N?0.0:B[0];
289290
plB[16] = CurrentOffSetB+16>=N?0.0:B[16];
290291

0 commit comments

Comments
 (0)