运行vllm_infer.py  报错：torch.AcceleratorError: CUDA error: CUDA-capable device(s) is/are busy or unavailable

### Reminder

- [x] I have read the above rules and searched the existing issues.

### System Info


- `llamafactory` version: 0.9.5.dev0
- Platform: Linux-5.4.0-42-generic-x86_64-with-glibc2.31
- Python version: 3.11.14
- PyTorch version: 2.9.1+cu129 (GPU)
- Transformers version: 4.57.6
- Datasets version: 4.0.0
- Accelerate version: 1.11.0
- PEFT version: 0.18.1
- GPU type: NVIDIA A100-SXM4-80GB
- GPU number: 2
- GPU memory: 79.25GB
- TRL version: 0.24.0
- DeepSpeed version: 0.18.4
- vLLM version: 0.15.1
- Git commit: c0245c43fc1fbb87ed6b2f2d28bdcceed5103946
- Default data directory: detected


### Reproduction

```text
运行 CUDA_VISIBLE_DEVICES=6,7 python scripts/vllm_infer.py --model_name_or_path Qwen/Qwen3-4B-Instruct-2507 --template qwen3_nothink --dataset alpaca_en_demo    

报错：
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING 02-25 14:51:38 [system_utils.py:140] We must use the `spawn` multiprocessing start method. Overriding VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. See https://docs.vllm.ai/en/latest/usage/troubleshooting.html#python-multiprocessing for more information. Reasons: CUDA is initialized
/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
(EngineCore_DP0 pid=4097130) INFO 02-25 14:51:44 [core.py:96] Initializing a V1 LLM engine (v0.15.1) with config: model='/home/amax/sk/model_data/Qwen3-32B', speculative_config=None, tokenizer='/home/amax/sk/model_data/Qwen3-32B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=3072, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=/home/amax/sk/model_data/Qwen3-32B, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [8192], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None, 'static_all_moe_layers': []}
(EngineCore_DP0 pid=4097130) WARNING 02-25 14:51:44 [multiproc_executor.py:910] Reducing Torch parallelism from 64 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
ERROR 02-25 14:51:52 [multiproc_executor.py:772] WorkerProc failed to start.
ERROR 02-25 14:51:52 [multiproc_executor.py:772] Traceback (most recent call last):
ERROR 02-25 14:51:52 [multiproc_executor.py:772]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 743, in worker_main
ERROR 02-25 14:51:52 [multiproc_executor.py:772]     worker = WorkerProc(*args, **kwargs)
ERROR 02-25 14:51:52 [multiproc_executor.py:772]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-25 14:51:52 [multiproc_executor.py:772]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 569, in __init__
ERROR 02-25 14:51:52 [multiproc_executor.py:772]     self.worker.init_device()
ERROR 02-25 14:51:52 [multiproc_executor.py:772]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/worker/worker_base.py", line 326, in init_device
ERROR 02-25 14:51:52 [multiproc_executor.py:772]     self.worker.init_device()  # type: ignore
ERROR 02-25 14:51:52 [multiproc_executor.py:772]     ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-25 14:51:52 [multiproc_executor.py:772]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 210, in init_device
ERROR 02-25 14:51:52 [multiproc_executor.py:772]     current_platform.set_device(self.device)
ERROR 02-25 14:51:52 [multiproc_executor.py:772]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/platforms/cuda.py", line 123, in set_device
ERROR 02-25 14:51:52 [multiproc_executor.py:772]     torch.cuda.set_device(device)
ERROR 02-25 14:51:52 [multiproc_executor.py:772]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/cuda/__init__.py", line 567, in set_device
ERROR 02-25 14:51:52 [multiproc_executor.py:772]     torch._C._cuda_setDevice(device)
ERROR 02-25 14:51:52 [multiproc_executor.py:772] torch.AcceleratorError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
ERROR 02-25 14:51:52 [multiproc_executor.py:772] Search for `cudaErrorDevicesUnavailable' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
ERROR 02-25 14:51:52 [multiproc_executor.py:772] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 02-25 14:51:52 [multiproc_executor.py:772] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
ERROR 02-25 14:51:52 [multiproc_executor.py:772] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
ERROR 02-25 14:51:52 [multiproc_executor.py:772]
INFO 02-25 14:51:52 [multiproc_executor.py:730] Parent process exited, terminating worker
INFO 02-25 14:51:52 [multiproc_executor.py:730] Parent process exited, terminating worker
INFO 02-25 14:51:53 [parallel_state.py:1212] world_size=2 rank=1 local_rank=1 distributed_init_method=tcp://127.0.0.1:38723 backend=nccl
[W225 14:51:56.592294033 TCPStore.cpp:340] [c10d] TCP client failed to connect/validate to host 127.0.0.1:38723 - retrying (try=0, timeout=600000ms, delay=2716ms): Interrupted system call
Exception raised from delay at /pytorch/torch/csrc/distributed/c10d/socket.cpp:115 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x80 (0x7f401375fb80 in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x5ffc5d1 (0x7f4070afd5d1 in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #2: <unknown function> + 0x14a15e8 (0x7f406bfa25e8 in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x607828b (0x7f4070b7928b in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x6078624 (0x7f4070b79624 in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0x5ff4ea3 (0x7f4070af5ea3 in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #6: c10d::TCPStore::TCPStore(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10d::TCPStoreOptions const&) + 0x41d (0x7f4070afc8cd in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0xd71465 (0x7f40801c8465 in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libtorch_python.so)
frame #8: <unknown function> + 0xdadda6 (0x7f4080204da6 in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x3cc7ad (0x7f407f8237ad in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0x1fd496 (0x55ba06b99496 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #11: _PyObject_MakeTpCall + 0x24b (0x55ba06b7584b in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #12: <unknown function> + 0x22d1de (0x55ba06bc91de in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #13: _PyObject_Call + 0x12b (0x55ba06bb3dab in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #14: <unknown function> + 0x2155df (0x55ba06bb15df in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #15: <unknown function> + 0x1d9b63 (0x55ba06b75b63 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #16: <unknown function> + 0x4c9bb (0x7f40969ff9bb in /home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/scipy/spatial/_distance_pybind.cpython-311-x86_64-linux-gnu.so)
frame #17: _PyObject_MakeTpCall + 0x24b (0x55ba06b7584b in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #18: _PyEval_EvalFrameDefault + 0x665 (0x55ba06b83b75 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #19: <unknown function> + 0x27b5c0 (0x55ba06c175c0 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #20: <unknown function> + 0x204083 (0x55ba06ba0083 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #21: PyObject_Vectorcall + 0x2c (0x55ba06b8facc in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #22: _PyEval_EvalFrameDefault + 0x665 (0x55ba06b83b75 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #23: _PyFunction_Vectorcall + 0x165 (0x55ba06ba97e5 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #24: PyObject_Call + 0x136 (0x55ba06bb3b46 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #25: _PyEval_EvalFrameDefault + 0x43bb (0x55ba06b878cb in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #26: _PyFunction_Vectorcall + 0x165 (0x55ba06ba97e5 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #27: PyObject_Call + 0x136 (0x55ba06bb3b46 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #28: _PyEval_EvalFrameDefault + 0x43bb (0x55ba06b878cb in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #29: _PyFunction_Vectorcall + 0x165 (0x55ba06ba97e5 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #30: <unknown function> + 0x2152b3 (0x55ba06bb12b3 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #31: <unknown function> + 0x1d9b63 (0x55ba06b75b63 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #32: PyObject_Call + 0xbe (0x55ba06bb3ace in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #33: _PyEval_EvalFrameDefault + 0x43bb (0x55ba06b878cb in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #34: _PyFunction_Vectorcall + 0x165 (0x55ba06ba97e5 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #35: PyObject_Call + 0x136 (0x55ba06bb3b46 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x43bb (0x55ba06b878cb in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #37: <unknown function> + 0x2a4435 (0x55ba06c40435 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #38: PyEval_EvalCode + 0x9d (0x55ba06c3fb7d in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #39: <unknown function> + 0x2c164a (0x55ba06c5d64a in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #40: <unknown function> + 0x2bd343 (0x55ba06c59343 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #41: PyRun_StringFlags + 0x62 (0x55ba06c4ebb2 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #42: PyRun_SimpleStringFlags + 0x3c (0x55ba06c4e96c in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #43: Py_RunMain + 0x30f (0x55ba06c6873f in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #44: Py_BytesMain + 0x37 (0x55ba06c2fa67 in /home/amax/miniconda/envs/llama_factory/bin/python)
frame #45: __libc_start_main + 0xf3 (0x7f40f929d083 in /lib/x86_64-linux-gnu/libc.so.6)
frame #46: <unknown function> + 0x2938d9 (0x55ba06c2f8d9 in /home/amax/miniconda/envs/llama_factory/bin/python)

(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946] EngineCore failed to start.
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946] Traceback (most recent call last):
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 937, in run_engine_core
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 691, in __init__
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]     super().__init__(
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 105, in __init__
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 97, in __init__
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]     super().__init__(vllm_config)
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/abstract.py", line 101, in __init__
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]     self._init_executor()
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 165, in _init_executor
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]     self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 678, in wait_for_ready
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946]     raise e from None
(EngineCore_DP0 pid=4097130) ERROR 02-25 14:52:00 [core.py:946] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(EngineCore_DP0 pid=4097130) Process EngineCore_DP0:
(EngineCore_DP0 pid=4097130) Traceback (most recent call last):
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=4097130)     self.run()
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=4097130)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 950, in run_engine_core
(EngineCore_DP0 pid=4097130)     raise e
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 937, in run_engine_core
(EngineCore_DP0 pid=4097130)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=4097130)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 691, in __init__
(EngineCore_DP0 pid=4097130)     super().__init__(
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 105, in __init__
(EngineCore_DP0 pid=4097130)     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=4097130)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 97, in __init__
(EngineCore_DP0 pid=4097130)     super().__init__(vllm_config)
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/abstract.py", line 101, in __init__
(EngineCore_DP0 pid=4097130)     self._init_executor()
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 165, in _init_executor
(EngineCore_DP0 pid=4097130)     self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore_DP0 pid=4097130)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=4097130)   File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 678, in wait_for_ready
(EngineCore_DP0 pid=4097130)     raise e from None
(EngineCore_DP0 pid=4097130) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
Traceback (most recent call last):
  File "/home/amax/sk/LlamaFactory/scripts/vllm_infer.py", line 282, in <module>
    fire.Fire(vllm_infer)
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/amax/sk/LlamaFactory/scripts/vllm_infer.py", line 125, in vllm_infer
    llm = LLM(**engine_args)
          ^^^^^^^^^^^^^^^^^^
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 334, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/llm_engine.py", line 172, in from_engine_args
    return cls(
           ^^^^
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/llm_engine.py", line 106, in __init__
    self.engine_core = EngineCoreClient.make_client(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 94, in make_client
    return SyncMPClient(vllm_config, executor_class, log_stats)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 647, in __init__
    super().__init__(
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 479, in __init__
    with launch_core_engines(vllm_config, executor_class, log_stats) as (
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/utils.py", line 933, in launch_core_engines
    wait_for_engine_startup(
  File "/home/amax/miniconda/envs/llama_factory/lib/python3.11/site-packages/vllm/v1/engine/utils.py", line 992, in wait_for_engine_startup
    raise RuntimeError(
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
而使用  vllm serve 在 4,5 显卡上执行任务是正常的。。。
```


### Others

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

运行vllm_infer.py 报错：torch.AcceleratorError: CUDA error: CUDA-capable device(s) is/are busy or unavailable #10240

Reminder

System Info

Reproduction

Others

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

运行vllm_infer.py 报错：torch.AcceleratorError: CUDA error: CUDA-capable device(s) is/are busy or unavailable #10240

Description

Reminder

System Info

Reproduction

Others

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions