Commit 381ba9f
committed
cuda clang: Fix argument order for __reduce_max_sync
The following cuda kernel would crash with an "an illegal instruction
was encountered" message.
__global__ void testcode(const float* data, unsigned *max_value) {
unsigned r = static_cast<unsigned>(data[threadIdx.x]);
const unsigned mask = __ballot_sync(0xFFFFFFFF, true);
unsigned mx = __reduce_max_sync(mask, r);
atomicMax(max_value, mx);
}
Digging into the ptx from both nvcc and clang, I discovered that the
arguments for the mask and value were swapped. This swaps them back.
Fixes: #131415
Signed-off-by: Austin Schuh <[email protected]>1 parent 37b5f77 commit 381ba9f
1 file changed
+1
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
315 | 315 | | |
316 | 316 | | |
317 | 317 | | |
318 | | - | |
| 318 | + | |
319 | 319 | | |
320 | 320 | | |
321 | 321 | | |
| |||
0 commit comments