Skip to content

Commit af7b538

Browse files
okakarpatvukovic-amd
authored andcommitted
[AUTOGENERATED] [release/2.8] [ROCm] Use opportunistic fastatomics based on heuristics (#2441)
1 parent 0826c75 commit af7b538

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

aten/src/ATen/native/cuda/KernelUtils.cuh

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -281,6 +281,13 @@ __device__ __forceinline__ void opportunistic_fastAtomicAdd(
281281
}
282282
}
283283

284+
if (numel > 16 /*<-hueristic threshold*/ * 64 ) {
285+
// well shucks, unlikely to capture same-dest atomics in a wave.
286+
// fall back to direct fastAtomic...
287+
fastAtomicAdd(self_ptr, index, numel, value, true);
288+
return;
289+
}
290+
284291
// not coalsced, so now let try to capture lane-matches...
285292
// __activemask() -- finds the set of threads in the warp that are about to perform atomicAdd
286293
// __match_any_sync() -- returns bit mask of the threads that have same dest addr

0 commit comments

Comments
 (0)