Skip to content

Commit 90dc33b

Browse files
committed
Add todo for reduceSum
1 parent b8938b4 commit 90dc33b

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

paddle/fluid/platform/cuda_helper.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,12 @@ __forceinline__ __device__ T __shfl_down_sync(unsigned, T val, int delta) {
7777

7878
template <typename T>
7979
__device__ T reduceSum(T val, int tid, int len) {
80+
// TODO(zcd): The warp size should be taken from the
81+
// parameters of the GPU but not specified as 32 simply.
82+
// To make the reduceSum more efficiently,
83+
// I use Warp-Level Parallelism and assume the Warp size
84+
// is 32 which may be different for different GPU,
85+
// but most card's warp size is 32.
8086
__shared__ T shm[32];
8187
const int warpSize = 32;
8288
unsigned mask = 0u;

0 commit comments

Comments
 (0)