Commit 1b44228
Clean up CUDA state between tests (#2335)
This PR fixes the unit test,
test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction FAILED
[0.1163s]
```
Traceback (most recent call last):
File "/var/lib/jenkins/pytorch/test/test_cuda.py", line 471, in test_set_per_process_memory_fraction
tmp_tensor = torch.empty(application, dtype=torch.int8, device="cuda")
RuntimeError: Trying to create tensor with negative dimension -5681285432: [-5681285432]
```
This error occurs only on gfx1101 arch.
This error is coming from an integer overflow when another unit test,
test/test_cuda.py::TestCuda::test_randint_generation_for_large_numel
creates a tensor with a huge numel, which overflows into a higher
torch.cuda.max_memory_reserved() when you call
test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction
afterward. To avoid this we introduced torch.cuda.empty_cache() and
torch.cuda.reset_peak_memory_stats() to clean up CUDA states.
JIRA: https://ontrack-internal.amd.com/browse/SWDEV-535295
(cherry picked from commit f86d184)1 parent ecc209d commit 1b44228
1 file changed
+3
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
467 | 467 | | |
468 | 468 | | |
469 | 469 | | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
470 | 473 | | |
471 | 474 | | |
472 | 475 | | |
| |||
0 commit comments