Skip to content

Commit 21d5daa

Browse files
authored
Add warning on CUDA graph memory usage (#2182)
1 parent 290e015 commit 21d5daa

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

vllm/worker/model_runner.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -395,6 +395,9 @@ def capture_model(self, kv_caches: List[KVCache]) -> None:
395395
"unexpected consequences if the model is not static. To "
396396
"run the model in eager mode, set 'enforce_eager=True' or "
397397
"use '--enforce-eager' in the CLI.")
398+
logger.info("CUDA graphs can take additional 1~3 GiB memory per GPU. "
399+
"If you are running out of memory, consider decreasing "
400+
"`gpu_memory_utilization` or enforcing eager mode.")
398401
start_time = time.perf_counter()
399402

400403
# Prepare dummy inputs. These will be reused for all batch sizes.

0 commit comments

Comments
 (0)