Skip to content

[Feature]: Qwen3 Omini AttributeError: 'Qwen3OmniMoeProcessor' object has no attribute '_get_num_multimodal_tokens' #27932

@rongchenggang

Description

@rongchenggang

🚀 The feature, motivation and pitch

INFO 11-02 00:22:03 [init.py:216] Automatically detected platform cuda.
�[1;36m(APIServer pid=1735408)�[0;0m INFO 11-02 00:22:07 [api_server.py:1839] vLLM API server version 0.11.0
�[1;36m(APIServer pid=1735408)�[0;0m INFO 11-02 00:22:07 [utils.py:233] non-default args: {'model_tag': '/gemini/space/yifq/zhaozy/models/Qwen/Qwen3-Omni-30B-A3B-Thinking', 'model': '/gemini/space/yifq/zhaozy/models/Qwen/Qwen3-Omni-30B-A3B-Thinking', 'trust_remote_code': True, 'tensor_parallel_size': 2}
�[1;36m(APIServer pid=1735408)�[0;0m The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
�[1;36m(APIServer pid=1735408)�[0;0m Unrecognized keys in rope_scaling for 'rope_type'='default': {'mrope_interleaved', 'interleaved', 'mrope_section'}
�[1;36m(APIServer pid=1735408)�[0;0m INFO 11-02 00:22:07 [model.py:547] Resolved architecture: TransformersForMultimodalLM
�[1;36m(APIServer pid=1735408)�[0;0m torch_dtype is deprecated! Use dtype instead!
�[1;36m(APIServer pid=1735408)�[0;0m INFO 11-02 00:22:07 [model.py:1510] Using max model len 65536
�[1;36m(APIServer pid=1735408)�[0;0m INFO 11-02 00:22:07 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=8192.
�[1;36m(APIServer pid=1735408)�[0;0m The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set TRANSFORMERS_VERBOSITY=info for more details.
�[1;36m(APIServer pid=1735408)�[0;0m WARNING 11-02 00:22:08 [utils.py:184] TransformersForMultimodalLM has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
INFO 11-02 00:22:16 [init.py:216] Automatically detected platform cuda.
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m INFO 11-02 00:22:20 [core.py:644] Waiting for init message from front-end.
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m INFO 11-02 00:22:20 [core.py:77] Initializing a V1 LLM engine (v0.11.0) with config: model='/gemini/space/yifq/zhaozy/models/Qwen/Qwen3-Omni-30B-A3B-Thinking', speculative_config=None, tokenizer='/gemini/space/yifq/zhaozy/models/Qwen/Qwen3-Omni-30B-A3B-Thinking', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=65536, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/gemini/space/yifq/zhaozy/models/Qwen/Qwen3-Omni-30B-A3B-Thinking, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention","vllm.sparse_attn_indexer"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,1],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m WARNING 11-02 00:22:20 [multiproc_executor.py:720] Reducing Torch parallelism from 96 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m INFO 11-02 00:22:20 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0, 1], buffer_handle=(2, 16777216, 10, 'psm_e923dcbf'), local_subscribe_addr='ipc:///tmp/abf03e1a-0ebb-4bb9-ba1a-0c685940f095', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 11-02 00:22:29 [init.py:216] Automatically detected platform cuda.
INFO 11-02 00:22:29 [init.py:216] Automatically detected platform cuda.
INFO 11-02 00:22:35 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_47719870'), local_subscribe_addr='ipc:///tmp/9cb4d87d-88d2-46ce-b3d7-c2bde827b010', remote_subscribe_addr=None, remote_addr_ipv6=False)
WARNING 11-02 00:22:35 [utils.py:184] TransformersForMultimodalLM has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
INFO 11-02 00:22:35 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_c0cb3024'), local_subscribe_addr='ipc:///tmp/95e5f52b-ac27-4b29-8db0-7d352b9fb26d', remote_subscribe_addr=None, remote_addr_ipv6=False)
WARNING 11-02 00:22:35 [utils.py:184] TransformersForMultimodalLM has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[NOTICE] The application is pending for GPU resource in asynchronous queue. The longest waiting time in queue is 1800 seconds.
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[NOTICE] The application is pending for GPU resource in asynchronous queue. The longest waiting time in queue is 1800 seconds.
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
INFO 11-02 00:22:36 [init.py:1384] Found nccl from library libnccl.so.2
INFO 11-02 00:22:36 [init.py:1384] Found nccl from library libnccl.so.2
INFO 11-02 00:22:36 [pynccl.py:103] vLLM is using nccl==2.23.4
INFO 11-02 00:22:36 [pynccl.py:103] vLLM is using nccl==2.23.4
INFO 11-02 00:22:39 [custom_all_reduce.py:35] Skipping P2P check and trusting the driver's P2P report.
INFO 11-02 00:22:39 [custom_all_reduce.py:35] Skipping P2P check and trusting the driver's P2P report.
INFO 11-02 00:22:39 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_7fbd03b8'), local_subscribe_addr='ipc:///tmp/7b767dee-ba19-45b1-bafa-aa91df94f399', remote_subscribe_addr=None, remote_addr_ipv6=False)
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
INFO 11-02 00:22:39 [init.py:1384] Found nccl from library libnccl.so.2
INFO 11-02 00:22:39 [pynccl.py:103] vLLM is using nccl==2.23.4
INFO 11-02 00:22:39 [init.py:1384] Found nccl from library libnccl.so.2
INFO 11-02 00:22:39 [pynccl.py:103] vLLM is using nccl==2.23.4
INFO 11-02 00:22:39 [parallel_state.py:1208] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1
INFO 11-02 00:22:39 [parallel_state.py:1208] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
WARNING 11-02 00:22:39 [topk_topp_sampler.py:66] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
WARNING 11-02 00:22:39 [topk_topp_sampler.py:66] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
The image processor of type Qwen2VLImageProcessor is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with use_fast=False. Note that this behavior will be extended to all models in a future release.
The image processor of type Qwen2VLImageProcessor is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with use_fast=False. Note that this behavior will be extended to all models in a future release.
ERROR 11-02 00:22:42 [multiproc_executor.py:597] WorkerProc failed to start.
ERROR 11-02 00:22:42 [multiproc_executor.py:597] Traceback (most recent call last):
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main
ERROR 11-02 00:22:42 [multiproc_executor.py:597] worker = WorkerProc(*args, **kwargs)
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 430, in init
ERROR 11-02 00:22:42 [multiproc_executor.py:597] self.worker.init_device()
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 259, in init_device
ERROR 11-02 00:22:42 [multiproc_executor.py:597] self.worker.init_device() # type: ignore
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 201, in init_device
ERROR 11-02 00:22:42 [multiproc_executor.py:597] self.model_runner: GPUModelRunner = GPUModelRunner(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 421, in init
ERROR 11-02 00:22:42 [multiproc_executor.py:597] self.mm_budget = MultiModalBudget(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/worker/utils.py", line 47, in init
ERROR 11-02 00:22:42 [multiproc_executor.py:597] max_tokens_by_modality = mm_registry
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/registry.py", line 167, in get_max_tokens_per_item_by_nonzero_modality
ERROR 11-02 00:22:42 [multiproc_executor.py:597] max_tokens_per_item = self.get_max_tokens_per_item_by_modality(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/registry.py", line 143, in get_max_tokens_per_item_by_modality
ERROR 11-02 00:22:42 [multiproc_executor.py:597] return profiler.get_mm_max_contiguous_tokens(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/profiling.py", line 282, in get_mm_max_contiguous_tokens
ERROR 11-02 00:22:42 [multiproc_executor.py:597] return self._get_mm_max_tokens(seq_len,
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/profiling.py", line 255, in _get_mm_max_tokens
ERROR 11-02 00:22:42 [multiproc_executor.py:597] max_tokens_per_item = self.processing_info.get_mm_max_tokens_per_item(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/model_executor/models/transformers.py", line 226, in get_mm_max_tokens_per_item
ERROR 11-02 00:22:42 [multiproc_executor.py:597] return {"image": self.get_max_image_tokens()}
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/model_executor/models/transformers.py", line 233, in get_max_image_tokens
ERROR 11-02 00:22:42 [multiproc_executor.py:597] mm_tokens = processor._get_num_multimodal_tokens(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] AttributeError: 'Qwen3OmniMoeProcessor' object has no attribute '_get_num_multimodal_tokens'
ERROR 11-02 00:22:42 [multiproc_executor.py:597] WorkerProc failed to start.
ERROR 11-02 00:22:42 [multiproc_executor.py:597] Traceback (most recent call last):
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main
ERROR 11-02 00:22:42 [multiproc_executor.py:597] worker = WorkerProc(*args, **kwargs)
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 430, in init
ERROR 11-02 00:22:42 [multiproc_executor.py:597] self.worker.init_device()
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 259, in init_device
ERROR 11-02 00:22:42 [multiproc_executor.py:597] self.worker.init_device() # type: ignore
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 201, in init_device
ERROR 11-02 00:22:42 [multiproc_executor.py:597] self.model_runner: GPUModelRunner = GPUModelRunner(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 421, in init
ERROR 11-02 00:22:42 [multiproc_executor.py:597] self.mm_budget = MultiModalBudget(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/worker/utils.py", line 47, in init
ERROR 11-02 00:22:42 [multiproc_executor.py:597] max_tokens_by_modality = mm_registry
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/registry.py", line 167, in get_max_tokens_per_item_by_nonzero_modality
ERROR 11-02 00:22:42 [multiproc_executor.py:597] max_tokens_per_item = self.get_max_tokens_per_item_by_modality(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/registry.py", line 143, in get_max_tokens_per_item_by_modality
ERROR 11-02 00:22:42 [multiproc_executor.py:597] return profiler.get_mm_max_contiguous_tokens(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/profiling.py", line 282, in get_mm_max_contiguous_tokens
ERROR 11-02 00:22:42 [multiproc_executor.py:597] return self._get_mm_max_tokens(seq_len,
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/profiling.py", line 255, in _get_mm_max_tokens
ERROR 11-02 00:22:42 [multiproc_executor.py:597] max_tokens_per_item = self.processing_info.get_mm_max_tokens_per_item(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/model_executor/models/transformers.py", line 226, in get_mm_max_tokens_per_item
ERROR 11-02 00:22:42 [multiproc_executor.py:597] return {"image": self.get_max_image_tokens()}
ERROR 11-02 00:22:42 [multiproc_executor.py:597] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/model_executor/models/transformers.py", line 233, in get_max_image_tokens
ERROR 11-02 00:22:42 [multiproc_executor.py:597] mm_tokens = processor._get_num_multimodal_tokens(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] AttributeError: 'Qwen3OmniMoeProcessor' object has no attribute '_get_num_multimodal_tokens'
INFO 11-02 00:22:42 [multiproc_executor.py:558] Parent process exited, terminating worker
INFO 11-02 00:22:42 [multiproc_executor.py:558] Parent process exited, terminating worker
[rank0]:[W1102 00:22:43.661670779 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] EngineCore failed to start.
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] Traceback (most recent call last):
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 498, in init
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] super().init(vllm_config, executor_class, log_stats,
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 83, in init
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] self.model_executor = executor_class(vllm_config)
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in init
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] self._init_executor()
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 106, in _init_executor
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] self.workers = WorkerProc.wait_for_ready(unready_workers)
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 509, in wait_for_ready
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] raise e from None
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m ERROR 11-02 00:22:43 [core.py:708] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m Process EngineCore_DP0:
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m Traceback (most recent call last):
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m self.run()
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/multiprocessing/process.py", line 108, in run
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m self._target(*self._args, **self._kwargs)
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m raise e
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m engine_core = EngineCoreProc(*args, **kwargs)
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 498, in init
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m super().init(vllm_config, executor_class, log_stats,
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 83, in init
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m self.model_executor = executor_class(vllm_config)
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in init
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m self._init_executor()
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 106, in _init_executor
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m self.workers = WorkerProc.wait_for_ready(unready_workers)
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 509, in wait_for_ready
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m raise e from None
�[1;36m(EngineCore_DP0 pid=1735949)�[0;0m Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
[NOTICE] The application is pending for GPU resource in asynchronous queue. The longest waiting time in queue is 1800 seconds.
�[1;36m(APIServer pid=1735408)�[0;0m Traceback (most recent call last):
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/bin/vllm", line 7, in
�[1;36m(APIServer pid=1735408)�[0;0m sys.exit(main())
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py", line 54, in main
�[1;36m(APIServer pid=1735408)�[0;0m args.dispatch_function(args)
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py", line 57, in cmd
�[1;36m(APIServer pid=1735408)�[0;0m uvloop.run(run_server(args))
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/uvloop/init.py", line 69, in run
�[1;36m(APIServer pid=1735408)�[0;0m return loop.run_until_complete(wrapper())
�[1;36m(APIServer pid=1735408)�[0;0m File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/uvloop/init.py", line 48, in wrapper
�[1;36m(APIServer pid=1735408)�[0;0m return await main
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
�[1;36m(APIServer pid=1735408)�[0;0m await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
�[1;36m(APIServer pid=1735408)�[0;0m async with build_async_engine_client(
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/contextlib.py", line 199, in aenter
�[1;36m(APIServer pid=1735408)�[0;0m return await anext(self.gen)
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
�[1;36m(APIServer pid=1735408)�[0;0m async with build_async_engine_client_from_engine_args(
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/contextlib.py", line 199, in aenter
�[1;36m(APIServer pid=1735408)�[0;0m return await anext(self.gen)
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args
�[1;36m(APIServer pid=1735408)�[0;0m async_llm = AsyncLLM.from_vllm_config(
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/utils/init.py", line 1572, in inner
�[1;36m(APIServer pid=1735408)�[0;0m return fn(*args, **kwargs)
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
�[1;36m(APIServer pid=1735408)�[0;0m return cls(
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 134, in init
�[1;36m(APIServer pid=1735408)�[0;0m self.engine_core = EngineCoreClient.make_async_mp_client(
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
�[1;36m(APIServer pid=1735408)�[0;0m return AsyncMPClient(*client_args)
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 769, in init
�[1;36m(APIServer pid=1735408)�[0;0m super().init(
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 448, in init
�[1;36m(APIServer pid=1735408)�[0;0m with launch_core_engines(vllm_config, executor_class,
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/contextlib.py", line 142, in exit
�[1;36m(APIServer pid=1735408)�[0;0m next(self.gen)
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
�[1;36m(APIServer pid=1735408)�[0;0m wait_for_engine_startup(
�[1;36m(APIServer pid=1735408)�[0;0m File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
�[1;36m(APIServer pid=1735408)�[0;0m raise RuntimeError("Engine core initialization failed. "
�[1;36m(APIServer pid=1735408)�[0;0m RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[NOTICE] The application is pending for GPU resource in asynchronous queue. The longest waiting time in queue is 1800 seconds.
/opt/conda/envs/minicpmo/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions