Is __shfl_down_sync used incorrectly in the reduction?


in reduction:  kernel reduce7 has code:

```
...
const unsigned int shmem_extent  = (blockSize / warpSize) > 0 ? (blockSize / warpSize) : 1;
const unsigned int ballot_result = __ballot_sync(mask, tid < shmem_extent);
    if (tid < shmem_extent) {
        mySum = sdata[tid];
        // Reduce final warp using shuffle or reduce_add if T==int & CUDA_ARCH ==
        // SM 8.0
        mySum = warpReduceSum<T>(ballot_result, mySum);
    }
...
```

if the kernel :   grid(1,1,1)  block(32,1,1),   the ```shmem_extent```  will be 1,  and  ``` ballot_result ``` is 1
the __shfl_dow_sync run as code:

```
template <class T> __device__ __forceinline__ T warpReduceSum(unsigned int mask, T mySum)
{
    for (int offset = warpSize / 2; offset > 0; offset /= 2) {
        mySum += __shfl_down_sync(mask, mySum, offset);
    }
    return mySum;
}
```

so only threadIdx.x ==0  do:
 mySum += __shfl_down_sync(1, mySum, 16)
 mySum += __shfl_down_sync(1, mySum, 8)
 mySum += __shfl_down_sync(1, mySum, 4)
 mySum += __shfl_down_sync(1, mySum, 2)
 mySum += __shfl_down_sync(1, mySum, 1)

dst thread is 0，read data: mySum form thread(16/8/4/2/1) .  In this situation, is it risky to execute __shfl_down_sync?

In the CUDA Programming Guide, it states that in this situation (when the source thread is inactive), the data obtained through __shfl_sync is undefined.
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
```
Threads may only read data from another thread which is actively participating in the __shfl_sync() command. If the target thread is [inactive](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#simt-architecture-notes), the retrieved value is undefined.
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is __shfl_down_sync used incorrectly in the reduction? #398

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is __shfl_down_sync used incorrectly in the reduction? #398

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions