Skip to content

Commit 36e6e88

Browse files
ikawrakowIwan Kawrakow
andauthored
Fix race in the CUDA DeepSeek FA kernel (#406)
Reference: ggml-org/llama.cpp#13438 Co-authored-by: Iwan Kawrakow <[email protected]>
1 parent a2d24c9 commit 36e6e88

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

ggml/src/ggml-cuda/fattn-new-mma.cu

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -898,6 +898,8 @@ static __device__ __forceinline__ void flash_attn_ext_f16_process_tile(
898898
KQ_crs += __shfl_xor_sync(0xFFFFFFFF, KQ_crs, offset, WARP_SIZE);
899899
}
900900

901+
__syncthreads();
902+
901903
// Write back combined meta data:
902904
#pragma unroll
903905
for (int imeta = 0; imeta < nmeta; ++imeta) {

0 commit comments

Comments
 (0)