Skip to content

Commit 95ec669

Browse files
Catch ROCM/HIP/AMD oom in should_reduce_batch_size (#812)
* Catch ROCM/HIP oom in should_reduce_batch_size * fix formatting --------- Co-authored-by: Clémentine Fourrier <[email protected]>
1 parent dd1af5a commit 95ec669

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

src/lighteval/utils/parallelism.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ def should_reduce_batch_size(exception: Exception) -> bool:
5050
"CUDA out of memory.", # CUDA OOM
5151
"cuDNN error: CUDNN_STATUS_NOT_SUPPORTED.", # CUDNN SNAFU
5252
"DefaultCPUAllocator: can't allocate memory", # CPU OOM
53+
"HIP out of memory", # ROCM OOM
5354
]
5455
if isinstance(exception, RuntimeError) and len(exception.args) == 1:
5556
return any(err in exception.args[0] for err in _statements)

0 commit comments

Comments
 (0)