Skip to content

Commit 0def0b8

Browse files
[AUTOGENERATED] [release/2.8] [ROCm] Improve reduction sum performance (#2505)
Cherry-pick of #2492 Co-authored-by: Jerry Mannil <[email protected]>
1 parent ab27a01 commit 0def0b8

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

aten/src/ATen/native/cuda/Reduce.cuh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1062,7 +1062,7 @@ ReduceConfig setReduceConfig(const TensorIterator& iter){
10621062
// In such case, values in each loaded vector always correspond to different outputs.
10631063
if (fastest_moving_stride == sizeof(scalar_t)) {
10641064
#ifdef USE_ROCM
1065-
if (reduction_on_fastest_striding_dimension && dim0 > 128 && iter.num_reduce_dims() == 1) {
1065+
if (reduction_on_fastest_striding_dimension && dim0 >= 128 && iter.num_reduce_dims() == 1) {
10661066
#else
10671067
if (reduction_on_fastest_striding_dimension && dim0 > 128 && iter.num_reduce_dims() == 1 && vt0 >= input_vec_size) {
10681068
#endif

0 commit comments

Comments
 (0)