Skip to content

Commit 333bfd1

Browse files
rraminenpragupta
authored andcommitted
Clean up CUDA state between tests (#2296)
This PR fixes the unit test, ``` test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction FAILED [0.1163s] Traceback (most recent call last): File "/var/lib/jenkins/pytorch/test/test_cuda.py", line 471, in test_set_per_process_memory_fraction tmp_tensor = torch.empty(application, dtype=torch.int8, device="cuda") RuntimeError: Trying to create tensor with negative dimension -5681285432: [-5681285432] ``` This error is specific to gfx1101 arch. This error is coming from an integer overflow when another unit test, `test/test_cuda.py::TestCuda::test_randint_generation_for_large_numel` creates a tensor with a huge numel, which overflows into a higher `torch.cuda.max_memory_reserved()` when you call `test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction` afterward. To avoid this we introduced `torch.cuda.empty_cache()` and `torch.cuda.reset_peak_memory_stats()` to clean up CUDA states. JIRA: https://ontrack-internal.amd.com/browse/SWDEV-535295 (cherry picked from commit fc804c3)
1 parent 2634c62 commit 333bfd1

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

test/test_cuda.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -456,6 +456,9 @@ def test_out_of_memory_retry(self):
456456
IS_JETSON, "oom reporting has issues on jetson igx due to partial nvml support"
457457
)
458458
def test_set_per_process_memory_fraction(self):
459+
if torch.version.hip and ('gfx1101' in torch.cuda.get_device_properties(0).gcnArchName):
460+
torch.cuda.empty_cache()
461+
torch.cuda.reset_peak_memory_stats()
459462
orig = torch.cuda.get_per_process_memory_fraction(0)
460463
torch.cuda.reset_peak_memory_stats(0)
461464
try:

0 commit comments

Comments
 (0)