Skip to content

Commit 8dda7d9

Browse files
add logger.warning to avoid possible oom conditions (#2489)
### What this PR does / why we need it? after capturing graph, the available memory on each NPU would be scarce, so we introduce this warning log to remind users. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? by ci. --------- Signed-off-by: zouyida <[email protected]> Co-authored-by: zouyida <[email protected]>
1 parent 81f3b9c commit 8dda7d9

File tree

1 file changed

+10
-0
lines changed

1 file changed

+10
-0
lines changed

vllm_ascend/worker/model_runner_v1.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2189,6 +2189,16 @@ def capture_model(self) -> None:
21892189
elapsed_time = end_time - start_time
21902190
npu_graph_size = start_free_npu_memory - end_free_npu_memory
21912191
# This usually takes 5~20 seconds.
2192+
# TODO: remove when aclgraph is ready for deepseek and uses more reasonable memory handling.
2193+
LOW_MEMORY_THRESHOLD_GB = 3
2194+
end_free_npu_memory_GB = end_free_npu_memory / (1 << 30)
2195+
if end_free_npu_memory_GB < LOW_MEMORY_THRESHOLD_GB:
2196+
logger.warning(
2197+
f"Post graph compilation, the available memory on each NPU is reduced to merely {end_free_npu_memory_GB:.2f} GB. "
2198+
"During inference operations, subsequent memory allocations may potentially trigger Out-of-Memory(OOM) conditions. "
2199+
"Should you encounter this warning followed by service termination, "
2200+
"consider decreasing gpu-memory-utilization configuration parameter."
2201+
)
21922202
logger.info("Graph capturing finished in %.0f secs, took %.2f GiB",
21932203
elapsed_time, npu_graph_size / (1 << 30))
21942204

0 commit comments

Comments
 (0)