Skip to content

Commit 1cc51c6

Browse files
pytorchboteqy
andauthored
[CUDA][avgpool2d] Fix backward launch bounds again for sm100, sm120 (pytorch#150676)
[CUDA][avgpool2d] Fix backward launch bounds again for `sm100`, `sm120` (pytorch#150640) `__CUDA_ARCH__` is not visible in host code, which causes incorrect launch bounds and `too many resources requested for launch` on blackwell Pull Request resolved: pytorch#150640 Approved by: https://github.com/malfet, https://github.com/drisspg, https://github.com/atalman (cherry picked from commit 09c4da9) Co-authored-by: Eddie Yan <[email protected]>
1 parent 28ca4dd commit 1cc51c6

File tree

1 file changed

+6
-5
lines changed

1 file changed

+6
-5
lines changed

aten/src/ATen/native/cuda/AveragePool2d.cu

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -402,11 +402,12 @@ TORCH_IMPL_FUNC(avg_pool2d_backward_out_cuda) (
402402
bool use_divisor = divisor_override.has_value();
403403
const auto divisor_override_value = use_divisor ? divisor_override.value() : 0;
404404
405-
#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 1000
406-
constexpr int double_threads = 768;
407-
#else
408-
constexpr int double_threads = 1024;
409-
#endif
405+
cudaDeviceProp* properties = at::cuda::getCurrentDeviceProperties();
406+
const bool gesm10x = properties->major >= 10;
407+
int double_threads = 1024;
408+
if (gesm10x) {
409+
double_threads = 768;
410+
}
410411
411412
AT_DISPATCH_FLOATING_TYPES_AND2(kHalf, kBFloat16, input.scalar_type(),
412413
"avg_pool2d_backward_out_cuda_frame",

0 commit comments

Comments
 (0)