Commit 6f12be2
CUDA 13.0 builds fix on Amazon Linux 2023 (pytorch#164893)
CUDA 13.0 builds fix on Amazon Linux 2023 (pytorch#164870)
During 2.9 rc testing I am seeing an issue on Amazon Linux 2023 with CUDA 13.0 builds
This is related to:
pytorch#152756
Workflow: https://github.com/pytorch/test-infra/actions/runs/18324074610/job/52184079262
Error:
```
WARNING: There was an error checking the latest version of pip.
+ python3.11 .ci/pytorch/smoke_test/smoke_test.py --package torchonly
Traceback (most recent call last):
File "/usr/local/lib64/python3.11/site-packages/torch/__init__.py", line 333, in _load_global_deps
ctypes.CDLL(global_deps_lib_path, mode=ctypes.RTLD_GLOBAL)
File "/usr/lib64/python3.11/ctypes/__init__.py", line 376, in __init__
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libcudart.so.13: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/pytorch/pytorch/.ci/pytorch/smoke_test/smoke_test.py", line 12, in <module>
import torch
File "/usr/local/lib64/python3.11/site-packages/torch/__init__.py", line 425, in <module>
_load_global_deps()
File "/usr/local/lib64/python3.11/site-packages/torch/__init__.py", line 383, in _load_global_deps
_preload_cuda_deps(lib_folder, lib_name)
File "/usr/local/lib64/python3.11/site-packages/torch/__init__.py", line 317, in _preload_cuda_deps
raise ValueError(f"{lib_name} not found in the system path {sys.path}")
Traceback (most recent call last):
ValueError: libnvToolsExt.so.*[0-9] not found in the system path ['/pytorch/pytorch/.ci/pytorch/smoke_test', '/usr/lib64/python311.zip', '/usr/lib64/python3.11', '/usr/lib64/python3.11/lib-dynload', '/usr/local/lib64/python3.11/site-packages', '/usr/local/lib/python3.11/site-packages', '/usr/lib64/python3.11/site-packages', '/usr/lib/python3.11/site-packages']
File "/home/ec2-user/actions-runner/_work/test-infra/test-infra/test-infra/.github/scripts/run_with_env_secrets.py", line 102, in <module>
main()
File "/home/ec2-user/actions-runner/_work/test-infra/test-infra/test-infra/.github/scripts/run_with_env_secrets.py", line 98, in main
run_cmd_or_die(f"docker exec -t {container_name} /exec")
File "/home/ec2-user/actions-runner/_work/test-infra/test-infra/test-infra/.github/scripts/run_with_env_secrets.py", line 39, in run_cmd_or_die
raise RuntimeError(f"Command {cmd} failed with exit code {exit_code}")
RuntimeError: Command docker exec -t 7d9c5bd403cac9a9ee824d63a1d6f6057ecce89a7daa94a81617dbf8eff0ff2e /exec failed with exit code 1
```
Pull Request resolved: pytorch#164870
Approved by: https://github.com/Camyll
(cherry picked from commit 483f4e0)
Co-authored-by: atalman <[email protected]>
Co-authored-by: Eli Uriegas <[email protected]>1 parent 42f0c2c commit 6f12be2
1 file changed
+7
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
302 | 302 | | |
303 | 303 | | |
304 | 304 | | |
305 | | - | |
| 305 | + | |
306 | 306 | | |
307 | 307 | | |
308 | 308 | | |
| |||
313 | 313 | | |
314 | 314 | | |
315 | 315 | | |
316 | | - | |
| 316 | + | |
317 | 317 | | |
318 | | - | |
| 318 | + | |
| 319 | + | |
319 | 320 | | |
320 | 321 | | |
321 | 322 | | |
| |||
354 | 355 | | |
355 | 356 | | |
356 | 357 | | |
357 | | - | |
358 | | - | |
359 | 358 | | |
360 | 359 | | |
361 | 360 | | |
| |||
369 | 368 | | |
370 | 369 | | |
371 | 370 | | |
372 | | - | |
373 | 371 | | |
374 | 372 | | |
375 | 373 | | |
| |||
381 | 379 | | |
382 | 380 | | |
383 | 381 | | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
384 | 385 | | |
385 | 386 | | |
386 | 387 | | |
| |||
0 commit comments