Commit 333bfd1
Clean up CUDA state between tests (#2296)
This PR fixes the unit test,
```
test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction FAILED [0.1163s]
Traceback (most recent call last):
File "/var/lib/jenkins/pytorch/test/test_cuda.py", line 471, in test_set_per_process_memory_fraction
tmp_tensor = torch.empty(application, dtype=torch.int8, device="cuda")
RuntimeError: Trying to create tensor with negative dimension -5681285432: [-5681285432]
```
This error is specific to gfx1101 arch.
This error is coming from an integer overflow when another unit test,
`test/test_cuda.py::TestCuda::test_randint_generation_for_large_numel`
creates a tensor with a huge numel, which overflows into a higher
`torch.cuda.max_memory_reserved()` when you call
`test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction`
afterward. To avoid this we introduced `torch.cuda.empty_cache()` and
`torch.cuda.reset_peak_memory_stats()` to clean up CUDA states.
JIRA: https://ontrack-internal.amd.com/browse/SWDEV-535295
(cherry picked from commit fc804c3)1 parent 2634c62 commit 333bfd1
1 file changed
+3
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
456 | 456 | | |
457 | 457 | | |
458 | 458 | | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
459 | 462 | | |
460 | 463 | | |
461 | 464 | | |
| |||
0 commit comments