Commit 944be5a
authored
[rocm7.0_internal_testing] Prevent static initialization of at::cuda::warp_size() (#2293)
Fixes SWDEV-540240, SWDEV-540309, SWDEV-539989
### Error
```
#24 437.7 what(): HIP error: no ROCm-capable device is detected
#24 437.7 HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
#24 437.7 For debugging consider passing AMD_SERIALIZE_KERNEL=3
#24 437.7 Device-side assertions were explicitly omitted for this error check; the error probably arose while initializing the DSA handlers.
#24 437.7 Exception raised from c10_hip_check_implementation at /pytorch/c10/hip/HIPException.cpp:44 (most recent call first):
#24 437.7 frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x88 (0x7f272de18738 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
#24 437.7 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x55 (0x7f272ddb42ed in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
...
#24 437.7 frame #7: at::cuda::getCurrentDeviceProperties() + 0x9 (0x7f270b5874e9 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_hip.so)
#24 437.7 frame #8: at::cuda::warp_size() + 0x9 (0x7f270b587509 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_hip.so)
#24 437.7 frame #9: <unknown function> + 0x81ac8b (0x7f2709c27c8b in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_hip.so)
```
### Explanation
80cca70
created a static global variable that used `at::cuda::warp_size()` to
initialize its value, which needs GPUs to be visible to query device
properties. However, GPUs are not present on CPU-only build systems.
### Solution
Convert static variable into a static function, thus preventing static
initialization.
### Validation
http://rocm-ci.amd.com/job/pyt_whl_docker_mainline/1461/artifact/build_artifacts.txt/*view*/
Ran microbenchmark to confirm basic functionality:
```
root@ubb4-rack-22:/var/lib/jenkins/pytorch-micro-benchmarking# python3 micro_benchmarking_pytorch.py --network resnet50
INFO: running forward and backward for warmup.
INFO: running the benchmark..
OK: finished running benchmark..
--------------------SUMMARY--------------------------
Microbenchmark for network : resnet50
Num devices: 1
Dtype: FP32
Mini batch size [img] : 64
Time per mini-batch : 0.10158218145370483
Throughput [img/sec] : 630.0317544289736=
```1 parent 347efdf commit 944be5a
File tree
4 files changed
+9
-5
lines changed- aten/src/ATen/native/cuda
4 files changed
+9
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
369 | 369 | | |
370 | 370 | | |
371 | 371 | | |
372 | | - | |
| 372 | + | |
373 | 373 | | |
374 | 374 | | |
375 | 375 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
89 | | - | |
| 89 | + | |
90 | 90 | | |
91 | 91 | | |
92 | 92 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
207 | 207 | | |
208 | 208 | | |
209 | 209 | | |
210 | | - | |
| 210 | + | |
211 | 211 | | |
212 | 212 | | |
213 | 213 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
19 | 21 | | |
20 | | - | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
21 | 25 | | |
22 | 26 | | |
23 | 27 | | |
| |||
0 commit comments