Skip to content

Commit 381ba9f

Browse files
committed
cuda clang: Fix argument order for __reduce_max_sync
The following cuda kernel would crash with an "an illegal instruction was encountered" message. __global__ void testcode(const float* data, unsigned *max_value) { unsigned r = static_cast<unsigned>(data[threadIdx.x]); const unsigned mask = __ballot_sync(0xFFFFFFFF, true); unsigned mx = __reduce_max_sync(mask, r); atomicMax(max_value, mx); } Digging into the ptx from both nvcc and clang, I discovered that the arguments for the mask and value were swapped. This swaps them back. Fixes: #131415 Signed-off-by: Austin Schuh <[email protected]>
1 parent 37b5f77 commit 381ba9f

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

llvm/lib/Target/NVPTX/NVPTXIntrinsics.td

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -315,7 +315,7 @@ defm MATCH_ALLP_SYNC_64 : MATCH_ALLP_SYNC<Int64Regs, "b64", int_nvvm_match_all_s
315315
multiclass REDUX_SYNC<string BinOp, string PTXType, Intrinsic Intrin> {
316316
def : NVPTXInst<(outs Int32Regs:$dst), (ins Int32Regs:$src, Int32Regs:$mask),
317317
"redux.sync." # BinOp # "." # PTXType # " $dst, $src, $mask;",
318-
[(set i32:$dst, (Intrin i32:$src, Int32Regs:$mask))]>,
318+
[(set i32:$dst, (Intrin i32:$mask, Int32Regs:$src))]>,
319319
Requires<[hasPTX<70>, hasSM<80>]>;
320320
}
321321

0 commit comments

Comments
 (0)