-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Description
🚀 The feature, motivation and pitch
INFO 11-02 00:22:03 [init.py:216] Automatically detected platform cuda.
�[1;36m(APIServer pid=1735408)�[0;0m INFO 11-02 00:22:07 [api_server.py:1839] vLLM API server version 0.11.0
�[1;36m(APIServer pid=1735408)�[0;0m INFO 11-02 00:22:07 [utils.py:233] non-default args: {'model_tag': '/gemini/space/yifq/zhaozy/models/Qwen/Qwen3-Omni-30B-A3B-Thinking', 'model': '/gemini/space/yifq/zhaozy/models/Qwen/Qwen3-Omni-30B-A3B-Thinking', 'trust_remote_code': True, 'tensor_parallel_size': 2}
�[1;36m(APIServer pid=1735408)�[0;0m The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
�[1;36m(APIServer pid=1735408)�[0;0m Unrecognized keys in rope_scaling for 'rope_type'='default': {'mrope_interleaved', 'interleaved', 'mrope_section'}
�[1;36m(APIServer pid=1735408)�[0;0m INFO 11-02 00:22:07 [model.py:547] Resolved architecture: TransformersForMultimodalLM
�[1;36m(APIServer pid=1735408)�[0;0m torch_dtype is deprecated! Use dtype instead!
�[1;36m(APIServer pid=1735408)�[0;0m INFO 11-02 00:22:07 [model.py:1510] Using max model len 65536
�[1;36m(APIServer pid=1735408)�[0;0m INFO 11-02 00:22:07 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=8192.
�[1;36m(APIServer pid=1735408)�[0;0m The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set TRANSFORMERS_VERBOSITY=info for more details.
�[1;36m(APIServer pid=1735408)�[0;0m WARNING 11-02 00:22:08 [utils.py:184] TransformersForMultimodalLM has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
INFO 11-02 00:22:16 [init.py:216] Automatically detected platform cuda.
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m INFO 11-02 00:22:20 [core.py:644] Waiting for init message from front-end.
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m INFO 11-02 00:22:20 [core.py:77] Initializing a V1 LLM engine (v0.11.0) with config: model='/gemini/space/yifq/zhaozy/models/Qwen/Qwen3-Omni-30B-A3B-Thinking', speculative_config=None, tokenizer='/gemini/space/yifq/zhaozy/models/Qwen/Qwen3-Omni-30B-A3B-Thinking', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=65536, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/gemini/space/yifq/zhaozy/models/Qwen/Qwen3-Omni-30B-A3B-Thinking, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention","vllm.sparse_attn_indexer"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,1],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m WARNING 11-02 00:22:20 [multiproc_executor.py:720] Reducing Torch parallelism from 96 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m INFO 11-02 00:22:20 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0, 1], buffer_handle=(2, 16777216, 10, 'psm_e923dcbf'), local_subscribe_addr='ipc:///tmp/abf03e1a-0ebb-4bb9-ba1a-0c685940f095', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 11-02 00:22:29 [init.py:216] Automatically detected platform cuda.
INFO 11-02 00:22:29 [init.py:216] Automatically detected platform cuda.
INFO 11-02 00:22:35 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_47719870'), local_subscribe_addr='ipc:///tmp/9cb4d87d-88d2-46ce-b3d7-c2bde827b010', remote_subscribe_addr=None, remote_addr_ipv6=False)
WARNING 11-02 00:22:35 [utils.py:184] TransformersForMultimodalLM has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
INFO 11-02 00:22:35 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_c0cb3024'), local_subscribe_addr='ipc:///tmp/95e5f52b-ac27-4b29-8db0-7d352b9fb26d', remote_subscribe_addr=None, remote_addr_ipv6=False)
WARNING 11-02 00:22:35 [utils.py:184] TransformersForMultimodalLM has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[NOTICE] The application is pending for GPU resource in asynchronous queue. The longest waiting time in queue is 1800 seconds.
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[NOTICE] The application is pending for GPU resource in asynchronous queue. The longest waiting time in queue is 1800 seconds.
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
INFO 11-02 00:22:36 [init.py:1384] Found nccl from library libnccl.so.2
INFO 11-02 00:22:36 [init.py:1384] Found nccl from library libnccl.so.2
INFO 11-02 00:22:36 [pynccl.py:103] vLLM is using nccl==2.23.4
INFO 11-02 00:22:36 [pynccl.py:103] vLLM is using nccl==2.23.4
INFO 11-02 00:22:39 [custom_all_reduce.py:35] Skipping P2P check and trusting the driver's P2P report.
INFO 11-02 00:22:39 [custom_all_reduce.py:35] Skipping P2P check and trusting the driver's P2P report.
INFO 11-02 00:22:39 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_7fbd03b8'), local_subscribe_addr='ipc:///tmp/7b767dee-ba19-45b1-bafa-aa91df94f399', remote_subscribe_addr=None, remote_addr_ipv6=False)
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
INFO 11-02 00:22:39 [init.py:1384] Found nccl from library libnccl.so.2
INFO 11-02 00:22:39 [pynccl.py:103] vLLM is using nccl==2.23.4
INFO 11-02 00:22:39 [init.py:1384] Found nccl from library libnccl.so.2
INFO 11-02 00:22:39 [pynccl.py:103] vLLM is using nccl==2.23.4
INFO 11-02 00:22:39 [parallel_state.py:1208] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1
INFO 11-02 00:22:39 [parallel_state.py:1208] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
WARNING 11-02 00:22:39 [topk_topp_sampler.py:66] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
WARNING 11-02 00:22:39 [topk_topp_sampler.py:66] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
The image processor of type Qwen2VLImageProcessor is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with use_fast=False. Note that this behavior will be extended to all models in a future release.
The image processor of type Qwen2VLImageProcessor is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with use_fast=False. Note that this behavior will be extended to all models in a future release.
ERROR 11-02 00:22:42 [multiproc_executor.py:597] WorkerProc failed to start.
ERROR 11-02 00:22:42 [multiproc_executor.py:597] Traceback (most recent call last):
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main
ERROR 11-02 00:22:42 [multiproc_executor.py:597] worker = WorkerProc(*args, **kwargs)
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 430, in init
ERROR 11-02 00:22:42 [multiproc_executor.py:597] self.worker.init_device()
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 259, in init_device
ERROR 11-02 00:22:42 [multiproc_executor.py:597] self.worker.init_device() # type: ignore
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 201, in init_device
ERROR 11-02 00:22:42 [multiproc_executor.py:597] self.model_runner: GPUModelRunner = GPUModelRunner(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 421, in init
ERROR 11-02 00:22:42 [multiproc_executor.py:597] self.mm_budget = MultiModalBudget(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/worker/utils.py", line 47, in init
ERROR 11-02 00:22:42 [multiproc_executor.py:597] max_tokens_by_modality = mm_registry
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/registry.py", line 167, in get_max_tokens_per_item_by_nonzero_modality
ERROR 11-02 00:22:42 [multiproc_executor.py:597] max_tokens_per_item = self.get_max_tokens_per_item_by_modality(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/registry.py", line 143, in get_max_tokens_per_item_by_modality
ERROR 11-02 00:22:42 [multiproc_executor.py:597] return profiler.get_mm_max_contiguous_tokens(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/profiling.py", line 282, in get_mm_max_contiguous_tokens
ERROR 11-02 00:22:42 [multiproc_executor.py:597] return self._get_mm_max_tokens(seq_len,
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/profiling.py", line 255, in _get_mm_max_tokens
ERROR 11-02 00:22:42 [multiproc_executor.py:597] max_tokens_per_item = self.processing_info.get_mm_max_tokens_per_item(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/model_executor/models/transformers.py", line 226, in get_mm_max_tokens_per_item
ERROR 11-02 00:22:42 [multiproc_executor.py:597] return {"image": self.get_max_image_tokens()}
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/model_executor/models/transformers.py", line 233, in get_max_image_tokens
ERROR 11-02 00:22:42 [multiproc_executor.py:597] mm_tokens = processor._get_num_multimodal_tokens(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] AttributeError: 'Qwen3OmniMoeProcessor' object has no attribute '_get_num_multimodal_tokens'
ERROR 11-02 00:22:42 [multiproc_executor.py:597] WorkerProc failed to start.
ERROR 11-02 00:22:42 [multiproc_executor.py:597] Traceback (most recent call last):
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main
ERROR 11-02 00:22:42 [multiproc_executor.py:597] worker = WorkerProc(*args, **kwargs)
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 430, in init
ERROR 11-02 00:22:42 [multiproc_executor.py:597] self.worker.init_device()
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 259, in init_device
ERROR 11-02 00:22:42 [multiproc_executor.py:597] self.worker.init_device() # type: ignore
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 201, in init_device
ERROR 11-02 00:22:42 [multiproc_executor.py:597] self.model_runner: GPUModelRunner = GPUModelRunner(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 421, in init
ERROR 11-02 00:22:42 [multiproc_executor.py:597] self.mm_budget = MultiModalBudget(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/worker/utils.py", line 47, in init
ERROR 11-02 00:22:42 [multiproc_executor.py:597] max_tokens_by_modality = mm_registry
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/registry.py", line 167, in get_max_tokens_per_item_by_nonzero_modality
ERROR 11-02 00:22:42 [multiproc_executor.py:597] max_tokens_per_item = self.get_max_tokens_per_item_by_modality(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/registry.py", line 143, in get_max_tokens_per_item_by_modality
ERROR 11-02 00:22:42 [multiproc_executor.py:597] return profiler.get_mm_max_contiguous_tokens(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/profiling.py", line 282, in get_mm_max_contiguous_tokens
ERROR 11-02 00:22:42 [multiproc_executor.py:597] return self._get_mm_max_tokens(seq_len,
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/profiling.py", line 255, in _get_mm_max_tokens
ERROR 11-02 00:22:42 [multiproc_executor.py:597] max_tokens_per_item = self.processing_info.get_mm_max_tokens_per_item(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/model_executor/models/transformers.py", line 226, in get_mm_max_tokens_per_item
ERROR 11-02 00:22:42 [multiproc_executor.py:597] return {"image": self.get_max_image_tokens()}
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/model_executor/models/transformers.py", line 233, in get_max_image_tokens
ERROR 11-02 00:22:42 [multiproc_executor.py:597] mm_tokens = processor._get_num_multimodal_tokens(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] AttributeError: 'Qwen3OmniMoeProcessor' object has no attribute '_get_num_multimodal_tokens'
INFO 11-02 00:22:42 [multiproc_executor.py:558] Parent process exited, terminating worker
INFO 11-02 00:22:42 [multiproc_executor.py:558] Parent process exited, terminating worker
[rank0]:[W1102 00:22:43.661670779 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] EngineCore failed to start.
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] Traceback (most recent call last):
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 498, in init
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] super().init(vllm_config, executor_class, log_stats,
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 83, in init
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] self.model_executor = executor_class(vllm_config)
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in init
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] self._init_executor()
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 106, in _init_executor
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] self.workers = WorkerProc.wait_for_ready(unready_workers)
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 509, in wait_for_ready
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] raise e from None
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m Process EngineCore_DP0:
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m Traceback (most recent call last):
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m self.run()
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/multiprocessing/process.py", line 108, in run
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m self._target(*self._args, **self._kwargs)
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m raise e
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m engine_core = EngineCoreProc(*args, **kwargs)
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 498, in init
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m super().init(vllm_config, executor_class, log_stats,
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 83, in init
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m self.model_executor = executor_class(vllm_config)
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in init
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m self._init_executor()
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 106, in _init_executor
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m self.workers = WorkerProc.wait_for_ready(unready_workers)
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 509, in wait_for_ready
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m raise e from None
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
[NOTICE] The application is pending for GPU resource in asynchronous queue. The longest waiting time in queue is 1800 seconds.
�[1;36m(APIServer pid=1735408)�[0;0m Traceback (most recent call last):
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/bin/vllm", line 7, in
�[1;36m(APIServer pid=1735408)�[0;0m sys.exit(main())
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py", line 54, in main
�[1;36m(APIServer pid=1735408)�[0;0m args.dispatch_function(args)
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py", line 57, in cmd
�[1;36m(APIServer pid=1735408)�[0;0m uvloop.run(run_server(args))
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/uvloop/init.py", line 69, in run
�[1;36m(APIServer pid=1735408)�[0;0m return loop.run_until_complete(wrapper())
�[1;36m(APIServer pid=1735408)�[0;0m File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/uvloop/init.py", line 48, in wrapper
�[1;36m(APIServer pid=1735408)�[0;0m return await main
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
�[1;36m(APIServer pid=1735408)�[0;0m await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
�[1;36m(APIServer pid=1735408)�[0;0m async with build_async_engine_client(
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/contextlib.py", line 199, in aenter
�[1;36m(APIServer pid=1735408)�[0;0m return await anext(self.gen)
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
�[1;36m(APIServer pid=1735408)�[0;0m async with build_async_engine_client_from_engine_args(
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/contextlib.py", line 199, in aenter
�[1;36m(APIServer pid=1735408)�[0;0m return await anext(self.gen)
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args
�[1;36m(APIServer pid=1735408)�[0;0m async_llm = AsyncLLM.from_vllm_config(
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/utils/init.py", line 1572, in inner
�[1;36m(APIServer pid=1735408)�[0;0m return fn(*args, **kwargs)
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
�[1;36m(APIServer pid=1735408)�[0;0m return cls(
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 134, in init
�[1;36m(APIServer pid=1735408)�[0;0m self.engine_core = EngineCoreClient.make_async_mp_client(
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
�[1;36m(APIServer pid=1735408)�[0;0m return AsyncMPClient(*client_args)
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 769, in init
�[1;36m(APIServer pid=1735408)�[0;0m super().init(
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 448, in init
�[1;36m(APIServer pid=1735408)�[0;0m with launch_core_engines(vllm_config, executor_class,
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/contextlib.py", line 142, in exit
�[1;36m(APIServer pid=1735408)�[0;0m next(self.gen)
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
�[1;36m(APIServer pid=1735408)�[0;0m wait_for_engine_startup(
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
�[1;36m(APIServer pid=1735408)�[0;0m raise RuntimeError("Engine core initialization failed. "
�[1;36m(APIServer pid=1735408)�[0;0m RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[NOTICE] The application is pending for GPU resource in asynchronous queue. The longest waiting time in queue is 1800 seconds.
/opt/conda/envs/minicpmo/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.