Skip to content

运行vllm_infer.py 报错:torch.AcceleratorError: CUDA error: CUDA-capable device(s) is/are busy or unavailable #10240

@skfeng36

Description

@skfeng36

Reminder

  • I have read the above rules and searched the existing issues.

System Info

  • llamafactory version: 0.9.5.dev0
  • Platform: Linux-5.4.0-42-generic-x86_64-with-glibc2.31
  • Python version: 3.11.14
  • PyTorch version: 2.9.1+cu129 (GPU)
  • Transformers version: 4.57.6
  • Datasets version: 4.0.0
  • Accelerate version: 1.11.0
  • PEFT version: 0.18.1
  • GPU type: NVIDIA A100-SXM4-80GB
  • GPU number: 2
  • GPU memory: 79.25GB
  • TRL version: 0.24.0
  • DeepSpeed version: 0.18.4
  • vLLM version: 0.15.1
  • Git commit: c0245c4
  • Default data directory: detected

Reproduction

运行 CUDA_VISIBLE_DEVICES=6,7 python scripts/vllm_infer.py --model_name_or_path Qwen/Qwen3-4B-Instruct-2507 --template qwen3_nothink --dataset alpaca_en_demo    

报错:
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING 02-25 14:51:38 [system_utils.py:140] We must use the `spawn` multiprocessing start method. Overriding VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. See https://docs.vllm.ai/en/latest/usage/troubleshooting.html#python-multiprocessing for more information. Reasons: CUDA is initialized
/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
(EngineCore_DP0 pid=4097130) INFO 02-25 14:51:44 [core.py:96] Initializing a V1 LLM engine (v0.15.1) with config: model='/home/amax/sk/model_data/Qwen3-32B', speculative_config=None, tokenizer='/home/amax/sk/model_data/Qwen3-32B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=3072, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=/home/amax/sk/model_data/Qwen3-32B, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [8192], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None, 'static_all_moe_layers': []}
(EngineCore_DP0 pid=4097130) WARNING 02-25 14:51:44 [multiproc_executor.py:910] Reducing Torch parallelism from 64 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
ERROR 02-25 14:51:52 [multiproc_executor.py:772] WorkerProc failed to start.
ERROR 02-25 14:51:52 [multiproc_executor.py:772] Traceback (most recent call last):
ERROR 02-25 14:51:52 [multiproc_executor.py:772]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 743, in worker_main
ERROR 02-25 14:51:52 [multiproc_executor.py:772]     worker = WorkerProc(*args, **kwargs)
ERROR 02-25 14:51:52 [multiproc_executor.py:772]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-25 14:51:52 [multiproc_executor.py:772]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 569, in __init__
ERROR 02-25 14:51:52 [multiproc_executor.py:772]     self.worker.init_device()
ERROR 02-25 14:51:52 [multiproc_executor.py:772]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/worker/worker_base.py", line 326, in init_device
ERROR 02-25 14:51:52 [multiproc_executor.py:772]     self.worker.init_device()  # type: ignore
ERROR 02-25 14:51:52 [multiproc_executor.py:772]     ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-25 14:51:52 [multiproc_executor.py:772]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 210, in init_device
ERROR 02-25 14:51:52 [multiproc_executor.py:772]     current_platform.set_device(self.device)
ERROR 02-25 14:51:52 [multiproc_executor.py:772]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/platforms/cuda.py", line 123, in set_device
ERROR 02-25 14:51:52 [multiproc_executor.py:772]     torch.cuda.set_device(device)
ERROR 02-25 14:51:52 [multiproc_executor.py:772]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/cuda/__init__.py", line 567, in set_device
ERROR 02-25 14:51:52 [multiproc_executor.py:772]     torch._C._cuda_setDevice(device)
ERROR 02-25 14:51:52 [multiproc_executor.py:772] torch.AcceleratorError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
ERROR 02-25 14:51:52 [multiproc_executor.py:772] Search for `cudaErrorDevicesUnavailable' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
ERROR 02-25 14:51:52 [multiproc_executor.py:772] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 02-25 14:51:52 [multiproc_executor.py:772] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
ERROR 02-25 14:51:52 [multiproc_executor.py:772] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
ERROR 02-25 14:51:52 [multiproc_executor.py:772]
INFO 02-25 14:51:52 [multiproc_executor.py:730] Parent process exited, terminating worker
INFO 02-25 14:51:52 [multiproc_executor.py:730] Parent process exited, terminating worker
INFO 02-25 14:51:53 [parallel_state.py:1212] world_size=2 rank=1 local_rank=1 distributed_init_method=tcp://127.0.0.1:38723 backend=nccl
[W225 14:51:56.592294033 TCPStore.cpp:340] [c10d] TCP client failed to connect/validate to host 127.0.0.1:38723 - retrying (try=0, timeout=600000ms, delay=2716ms): Interrupted system call
Exception raised from delay at /pytorch/torch/csrc/distributed/c10d/socket.cpp:115 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x80 (0x7f401375fb80 in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x5ffc5d1 (0x7f4070afd5d1 in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #2: <unknown function> + 0x14a15e8 (0x7f406bfa25e8 in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x607828b (0x7f4070b7928b in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x6078624 (0x7f4070b79624 in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0x5ff4ea3 (0x7f4070af5ea3 in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #6: c10d::TCPStore::TCPStore(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10d::TCPStoreOptions const&) + 0x41d (0x7f4070afc8cd in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0xd71465 (0x7f40801c8465 in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libtorch_python.so)
frame #8: <unknown function> + 0xdadda6 (0x7f4080204da6 in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x3cc7ad (0x7f407f8237ad in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0x1fd496 (0x55ba06b99496 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #11: _PyObject_MakeTpCall + 0x24b (0x55ba06b7584b in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #12: <unknown function> + 0x22d1de (0x55ba06bc91de in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #13: _PyObject_Call + 0x12b (0x55ba06bb3dab in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #14: <unknown function> + 0x2155df (0x55ba06bb15df in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #15: <unknown function> + 0x1d9b63 (0x55ba06b75b63 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #16: <unknown function> + 0x4c9bb (0x7f40969ff9bb in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/scipy/spatial/_distance_pybind.cpython-311-x86_64-linux-gnu.so)
frame #17: _PyObject_MakeTpCall + 0x24b (0x55ba06b7584b in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #18: _PyEval_EvalFrameDefault + 0x665 (0x55ba06b83b75 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #19: <unknown function> + 0x27b5c0 (0x55ba06c175c0 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #20: <unknown function> + 0x204083 (0x55ba06ba0083 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #21: PyObject_Vectorcall + 0x2c (0x55ba06b8facc in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #22: _PyEval_EvalFrameDefault + 0x665 (0x55ba06b83b75 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #23: _PyFunction_Vectorcall + 0x165 (0x55ba06ba97e5 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #24: PyObject_Call + 0x136 (0x55ba06bb3b46 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #25: _PyEval_EvalFrameDefault + 0x43bb (0x55ba06b878cb in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #26: _PyFunction_Vectorcall + 0x165 (0x55ba06ba97e5 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #27: PyObject_Call + 0x136 (0x55ba06bb3b46 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #28: _PyEval_EvalFrameDefault + 0x43bb (0x55ba06b878cb in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #29: _PyFunction_Vectorcall + 0x165 (0x55ba06ba97e5 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #30: <unknown function> + 0x2152b3 (0x55ba06bb12b3 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #31: <unknown function> + 0x1d9b63 (0x55ba06b75b63 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #32: PyObject_Call + 0xbe (0x55ba06bb3ace in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #33: _PyEval_EvalFrameDefault + 0x43bb (0x55ba06b878cb in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #34: _PyFunction_Vectorcall + 0x165 (0x55ba06ba97e5 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #35: PyObject_Call + 0x136 (0x55ba06bb3b46 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x43bb (0x55ba06b878cb in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #37: <unknown function> + 0x2a4435 (0x55ba06c40435 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #38: PyEval_EvalCode + 0x9d (0x55ba06c3fb7d in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #39: <unknown function> + 0x2c164a (0x55ba06c5d64a in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #40: <unknown function> + 0x2bd343 (0x55ba06c59343 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #41: PyRun_StringFlags + 0x62 (0x55ba06c4ebb2 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #42: PyRun_SimpleStringFlags + 0x3c (0x55ba06c4e96c in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #43: Py_RunMain + 0x30f (0x55ba06c6873f in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #44: Py_BytesMain + 0x37 (0x55ba06c2fa67 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #45: __libc_start_main + 0xf3 (0x7f40f929d083 in /lib/x86_64-linux-gnu/libc.so.6)
frame #46: <unknown function> + 0x2938d9 (0x55ba06c2f8d9 in /home/amax/miniconda/envs/llama_factory/bin/python)

(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946] EngineCore failed to start.
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946] Traceback (most recent call last):
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 937, in run_engine_core
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 691, in __init__
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]     super().__init__(
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 105, in __init__
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 97, in __init__
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]     super().__init__(vllm_config)
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/abstract.py", line 101, in __init__
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]     self._init_executor()
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 165, in _init_executor
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]     self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 678, in wait_for_ready
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]     raise e from None
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(EngineCore_DP0 pid=4097130) Process EngineCore_DP0:
(EngineCore_DP0 pid=4097130) Traceback (most recent call last):
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=4097130)     self.run()
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=4097130)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 950, in run_engine_core
(EngineCore_DP0 pid=4097130)     raise e
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 937, in run_engine_core
(EngineCore_DP0 pid=4097130)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=4097130)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 691, in __init__
(EngineCore_DP0 pid=4097130)     super().__init__(
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 105, in __init__
(EngineCore_DP0 pid=4097130)     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=4097130)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 97, in __init__
(EngineCore_DP0 pid=4097130)     super().__init__(vllm_config)
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/abstract.py", line 101, in __init__
(EngineCore_DP0 pid=4097130)     self._init_executor()
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 165, in _init_executor
(EngineCore_DP0 pid=4097130)     self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore_DP0 pid=4097130)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 678, in wait_for_ready
(EngineCore_DP0 pid=4097130)     raise e from None
(EngineCore_DP0 pid=4097130) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
Traceback (most recent call last):
  File "/home/amax/sk/LlamaFactory/scripts/vllm_infer.py", line 282, in <module>
    fire.Fire(vllm_infer)
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/amax/sk/LlamaFactory/scripts/vllm_infer.py", line 125, in vllm_infer
    llm = LLM(**engine_args)
          ^^^^^^^^^^^^^^^^^^
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 334, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/llm_engine.py", line 172, in from_engine_args
    return cls(
           ^^^^
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/llm_engine.py", line 106, in __init__
    self.engine_core = EngineCoreClient.make_client(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 94, in make_client
    return SyncMPClient(vllm_config, executor_class, log_stats)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 647, in __init__
    super().__init__(
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 479, in __init__
    with launch_core_engines(vllm_config, executor_class, log_stats) as (
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/utils.py", line 933, in launch_core_engines
    wait_for_engine_startup(
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/utils.py", line 992, in wait_for_engine_startup
    raise RuntimeError(
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
而使用  vllm serve 在 4,5 显卡上执行任务是正常的。。。

Others

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpendingThis problem is yet to be addressed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions