Skip to content

Commit b04547a

Browse files
authored
[AMD] Fix addresses for other stores from BufferLoadToLocal (#7016)
We need to predicate the `llStore` for the other value based on the initial mask or otherwise we would need to shuffle the other value as well. `AsyncCopyGlobalToLocal` does this already correctly.
1 parent d757d1e commit b04547a

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

third_party/amd/lib/TritonAMDGPUToLLVM/LoadStoreOpToLLVM.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -561,8 +561,8 @@ struct BufferLoadToLocalOpConversion
561561
rewriter, this->getTypeConverter(), loc, vecTy, otherElems, srcIdx);
562562
llStore(rewriter, loc,
563563
hasSwizzling ? swizzledShmemAddr[i] : coalescedShmemAddr[i],
564-
storeVal, b.icmp_ne(pred, b.true_val()), op.getCache(),
565-
/*forceNoAliasAsyncLoads=*/true);
564+
storeVal, b.icmp_ne(maskElems[srcIdx], b.true_val()),
565+
op.getCache(), /*forceNoAliasAsyncLoads=*/true);
566566
}
567567
}
568568

0 commit comments

Comments
 (0)