[CI] Add max_split_size_mb for e2e test to avoid oom (#3252)

wangxiyuan · web-flow · commit 15b8aff5826e · 2025-09-29T09:13:08.000+08:00
### What this PR does / why we need it? we add a patch for model weight loader to avoid using vLLM weight loader v2, since v2 will lead unknown issue for torchair. While this patch make some unknown memory usage problem. To quick fix the problem, let's expend the `max_split_size_mb` to a larger value to avoid weight load oom issue. Further solution is to remove the patch and address weight loader v2 from vLLM. Closes: #3251 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@releases/v0.11.0 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
diff --git a/tests/e2e/singlecard/spec_decode_v1/test_v1_spec_decode.py b/tests/e2e/singlecard/spec_decode_v1/test_v1_spec_decode.py
@@ -11,6 +11,7 @@
 from tests.e2e.conftest import VllmRunner
 
 os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn"
+os.environ["PYTORCH_NPU_ALLOC_CONF"] = "max_split_size_mb:256"
 
 
 @pytest.fixture