[Feature]: Qwen3 Omini AttributeError: 'Qwen3OmniMoeProcessor' object has no attribute '_get_num_multimodal_tokens'

### 🚀 The feature, motivation and pitch

INFO 11-02 00:22:03 [__init__.py:216] Automatically detected platform cuda.
[1;36m(APIServer pid=1735408)[0;0m INFO 11-02 00:22:07 [api_server.py:1839] vLLM API server version 0.11.0
[1;36m(APIServer pid=1735408)[0;0m INFO 11-02 00:22:07 [utils.py:233] non-default args: {'model_tag': '/gemini/space/yifq/zhaozy/models/Qwen/Qwen3-Omni-30B-A3B-Thinking', 'model': '/gemini/space/yifq/zhaozy/models/Qwen/Qwen3-Omni-30B-A3B-Thinking', 'trust_remote_code': True, 'tensor_parallel_size': 2}
[1;36m(APIServer pid=1735408)[0;0m The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
[1;36m(APIServer pid=1735408)[0;0m Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_interleaved', 'interleaved', 'mrope_section'}
[1;36m(APIServer pid=1735408)[0;0m INFO 11-02 00:22:07 [model.py:547] Resolved architecture: TransformersForMultimodalLM
[1;36m(APIServer pid=1735408)[0;0m `torch_dtype` is deprecated! Use `dtype` instead!
[1;36m(APIServer pid=1735408)[0;0m INFO 11-02 00:22:07 [model.py:1510] Using max model len 65536
[1;36m(APIServer pid=1735408)[0;0m INFO 11-02 00:22:07 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=8192.
[1;36m(APIServer pid=1735408)[0;0m The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
[1;36m(APIServer pid=1735408)[0;0m WARNING 11-02 00:22:08 [utils.py:184] TransformersForMultimodalLM has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
INFO 11-02 00:22:16 [__init__.py:216] Automatically detected platform cuda.
[1;36m(EngineCore_DP0 pid=1735949)[0;0m INFO 11-02 00:22:20 [core.py:644] Waiting for init message from front-end.
[1;36m(EngineCore_DP0 pid=1735949)[0;0m INFO 11-02 00:22:20 [core.py:77] Initializing a V1 LLM engine (v0.11.0) with config: model='/gemini/space/yifq/zhaozy/models/Qwen/Qwen3-Omni-30B-A3B-Thinking', speculative_config=None, tokenizer='/gemini/space/yifq/zhaozy/models/Qwen/Qwen3-Omni-30B-A3B-Thinking', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=65536, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/gemini/space/yifq/zhaozy/models/Qwen/Qwen3-Omni-30B-A3B-Thinking, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention","vllm.sparse_attn_indexer"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,1],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
[1;36m(EngineCore_DP0 pid=1735949)[0;0m WARNING 11-02 00:22:20 [multiproc_executor.py:720] Reducing Torch parallelism from 96 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
[1;36m(EngineCore_DP0 pid=1735949)[0;0m INFO 11-02 00:22:20 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0, 1], buffer_handle=(2, 16777216, 10, 'psm_e923dcbf'), local_subscribe_addr='ipc:///tmp/abf03e1a-0ebb-4bb9-ba1a-0c685940f095', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 11-02 00:22:29 [__init__.py:216] Automatically detected platform cuda.
INFO 11-02 00:22:29 [__init__.py:216] Automatically detected platform cuda.
INFO 11-02 00:22:35 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_47719870'), local_subscribe_addr='ipc:///tmp/9cb4d87d-88d2-46ce-b3d7-c2bde827b010', remote_subscribe_addr=None, remote_addr_ipv6=False)
WARNING 11-02 00:22:35 [utils.py:184] TransformersForMultimodalLM has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
INFO 11-02 00:22:35 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_c0cb3024'), local_subscribe_addr='ipc:///tmp/95e5f52b-ac27-4b29-8db0-7d352b9fb26d', remote_subscribe_addr=None, remote_addr_ipv6=False)
WARNING 11-02 00:22:35 [utils.py:184] TransformersForMultimodalLM has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[NOTICE] The application is pending for GPU resource in asynchronous queue. The longest waiting time in queue is 1800 seconds.
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[NOTICE] The application is pending for GPU resource in asynchronous queue. The longest waiting time in queue is 1800 seconds.
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
INFO 11-02 00:22:36 [__init__.py:1384] Found nccl from library libnccl.so.2
INFO 11-02 00:22:36 [__init__.py:1384] Found nccl from library libnccl.so.2
INFO 11-02 00:22:36 [pynccl.py:103] vLLM is using nccl==2.23.4
INFO 11-02 00:22:36 [pynccl.py:103] vLLM is using nccl==2.23.4
INFO 11-02 00:22:39 [custom_all_reduce.py:35] Skipping P2P check and trusting the driver's P2P report.
INFO 11-02 00:22:39 [custom_all_reduce.py:35] Skipping P2P check and trusting the driver's P2P report.
INFO 11-02 00:22:39 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_7fbd03b8'), local_subscribe_addr='ipc:///tmp/7b767dee-ba19-45b1-bafa-aa91df94f399', remote_subscribe_addr=None, remote_addr_ipv6=False)
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
INFO 11-02 00:22:39 [__init__.py:1384] Found nccl from library libnccl.so.2
INFO 11-02 00:22:39 [pynccl.py:103] vLLM is using nccl==2.23.4
INFO 11-02 00:22:39 [__init__.py:1384] Found nccl from library libnccl.so.2
INFO 11-02 00:22:39 [pynccl.py:103] vLLM is using nccl==2.23.4
INFO 11-02 00:22:39 [parallel_state.py:1208] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1
INFO 11-02 00:22:39 [parallel_state.py:1208] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
WARNING 11-02 00:22:39 [topk_topp_sampler.py:66] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
WARNING 11-02 00:22:39 [topk_topp_sampler.py:66] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
ERROR 11-02 00:22:42 [multiproc_executor.py:597] WorkerProc failed to start.
ERROR 11-02 00:22:42 [multiproc_executor.py:597] Traceback (most recent call last):
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     worker = WorkerProc(*args, **kwargs)
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 430, in __init__
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     self.worker.init_device()
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 259, in init_device
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     self.worker.init_device()  # type: ignore
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 201, in init_device
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     self.model_runner: GPUModelRunner = GPUModelRunner(
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 421, in __init__
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     self.mm_budget = MultiModalBudget(
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/worker/utils.py", line 47, in __init__
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     max_tokens_by_modality = mm_registry \
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/registry.py", line 167, in get_max_tokens_per_item_by_nonzero_modality
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     max_tokens_per_item = self.get_max_tokens_per_item_by_modality(
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/registry.py", line 143, in get_max_tokens_per_item_by_modality
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     return profiler.get_mm_max_contiguous_tokens(
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/profiling.py", line 282, in get_mm_max_contiguous_tokens
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     return self._get_mm_max_tokens(seq_len,
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/profiling.py", line 255, in _get_mm_max_tokens
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     max_tokens_per_item = self.processing_info.get_mm_max_tokens_per_item(
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/model_executor/models/transformers.py", line 226, in get_mm_max_tokens_per_item
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     return {"image": self.get_max_image_tokens()}
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/model_executor/models/transformers.py", line 233, in get_max_image_tokens
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     mm_tokens = processor._get_num_multimodal_tokens(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] AttributeError: 'Qwen3OmniMoeProcessor' object has no attribute '_get_num_multimodal_tokens'
ERROR 11-02 00:22:42 [multiproc_executor.py:597] WorkerProc failed to start.
ERROR 11-02 00:22:42 [multiproc_executor.py:597] Traceback (most recent call last):
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     worker = WorkerProc(*args, **kwargs)
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 430, in __init__
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     self.worker.init_device()
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 259, in init_device
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     self.worker.init_device()  # type: ignore
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 201, in init_device
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     self.model_runner: GPUModelRunner = GPUModelRunner(
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 421, in __init__
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     self.mm_budget = MultiModalBudget(
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/worker/utils.py", line 47, in __init__
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     max_tokens_by_modality = mm_registry \
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/registry.py", line 167, in get_max_tokens_per_item_by_nonzero_modality
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     max_tokens_per_item = self.get_max_tokens_per_item_by_modality(
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/registry.py", line 143, in get_max_tokens_per_item_by_modality
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     return profiler.get_mm_max_contiguous_tokens(
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/profiling.py", line 282, in get_mm_max_contiguous_tokens
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     return self._get_mm_max_tokens(seq_len,
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/multimodal/profiling.py", line 255, in _get_mm_max_tokens
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     max_tokens_per_item = self.processing_info.get_mm_max_tokens_per_item(
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/model_executor/models/transformers.py", line 226, in get_mm_max_tokens_per_item
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     return {"image": self.get_max_image_tokens()}
ERROR 11-02 00:22:42 [multiproc_executor.py:597]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/model_executor/models/transformers.py", line 233, in get_max_image_tokens
ERROR 11-02 00:22:42 [multiproc_executor.py:597]     mm_tokens = processor._get_num_multimodal_tokens(
ERROR 11-02 00:22:42 [multiproc_executor.py:597] AttributeError: 'Qwen3OmniMoeProcessor' object has no attribute '_get_num_multimodal_tokens'
INFO 11-02 00:22:42 [multiproc_executor.py:558] Parent process exited, terminating worker
INFO 11-02 00:22:42 [multiproc_executor.py:558] Parent process exited, terminating worker
[rank0]:[W1102 00:22:43.661670779 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[1;36m(EngineCore_DP0 pid=1735949)[0;0m ERROR 11-02 00:22:43 [core.py:708] EngineCore failed to start.
[1;36m(EngineCore_DP0 pid=1735949)[0;0m ERROR 11-02 00:22:43 [core.py:708] Traceback (most recent call last):
[1;36m(EngineCore_DP0 pid=1735949)[0;0m ERROR 11-02 00:22:43 [core.py:708]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
[1;36m(EngineCore_DP0 pid=1735949)[0;0m ERROR 11-02 00:22:43 [core.py:708]     engine_core = EngineCoreProc(*args, **kwargs)
[1;36m(EngineCore_DP0 pid=1735949)[0;0m ERROR 11-02 00:22:43 [core.py:708]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 498, in __init__
[1;36m(EngineCore_DP0 pid=1735949)[0;0m ERROR 11-02 00:22:43 [core.py:708]     super().__init__(vllm_config, executor_class, log_stats,
[1;36m(EngineCore_DP0 pid=1735949)[0;0m ERROR 11-02 00:22:43 [core.py:708]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 83, in __init__
[1;36m(EngineCore_DP0 pid=1735949)[0;0m ERROR 11-02 00:22:43 [core.py:708]     self.model_executor = executor_class(vllm_config)
[1;36m(EngineCore_DP0 pid=1735949)[0;0m ERROR 11-02 00:22:43 [core.py:708]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in __init__
[1;36m(EngineCore_DP0 pid=1735949)[0;0m ERROR 11-02 00:22:43 [core.py:708]     self._init_executor()
[1;36m(EngineCore_DP0 pid=1735949)[0;0m ERROR 11-02 00:22:43 [core.py:708]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 106, in _init_executor
[1;36m(EngineCore_DP0 pid=1735949)[0;0m ERROR 11-02 00:22:43 [core.py:708]     self.workers = WorkerProc.wait_for_ready(unready_workers)
[1;36m(EngineCore_DP0 pid=1735949)[0;0m ERROR 11-02 00:22:43 [core.py:708]   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 509, in wait_for_ready
[1;36m(EngineCore_DP0 pid=1735949)[0;0m ERROR 11-02 00:22:43 [core.py:708]     raise e from None
[1;36m(EngineCore_DP0 pid=1735949)[0;0m ERROR 11-02 00:22:43 [core.py:708] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
[1;36m(EngineCore_DP0 pid=1735949)[0;0m Process EngineCore_DP0:
[1;36m(EngineCore_DP0 pid=1735949)[0;0m Traceback (most recent call last):
[1;36m(EngineCore_DP0 pid=1735949)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
[1;36m(EngineCore_DP0 pid=1735949)[0;0m     self.run()
[1;36m(EngineCore_DP0 pid=1735949)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/multiprocessing/process.py", line 108, in run
[1;36m(EngineCore_DP0 pid=1735949)[0;0m     self._target(*self._args, **self._kwargs)
[1;36m(EngineCore_DP0 pid=1735949)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
[1;36m(EngineCore_DP0 pid=1735949)[0;0m     raise e
[1;36m(EngineCore_DP0 pid=1735949)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
[1;36m(EngineCore_DP0 pid=1735949)[0;0m     engine_core = EngineCoreProc(*args, **kwargs)
[1;36m(EngineCore_DP0 pid=1735949)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 498, in __init__
[1;36m(EngineCore_DP0 pid=1735949)[0;0m     super().__init__(vllm_config, executor_class, log_stats,
[1;36m(EngineCore_DP0 pid=1735949)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 83, in __init__
[1;36m(EngineCore_DP0 pid=1735949)[0;0m     self.model_executor = executor_class(vllm_config)
[1;36m(EngineCore_DP0 pid=1735949)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in __init__
[1;36m(EngineCore_DP0 pid=1735949)[0;0m     self._init_executor()
[1;36m(EngineCore_DP0 pid=1735949)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 106, in _init_executor
[1;36m(EngineCore_DP0 pid=1735949)[0;0m     self.workers = WorkerProc.wait_for_ready(unready_workers)
[1;36m(EngineCore_DP0 pid=1735949)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 509, in wait_for_ready
[1;36m(EngineCore_DP0 pid=1735949)[0;0m     raise e from None
[1;36m(EngineCore_DP0 pid=1735949)[0;0m Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
[NOTICE] The application is pending for GPU resource in asynchronous queue. The longest waiting time in queue is 1800 seconds.
[1;36m(APIServer pid=1735408)[0;0m Traceback (most recent call last):
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/bin/vllm", line 7, in <module>
[1;36m(APIServer pid=1735408)[0;0m     sys.exit(main())
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py", line 54, in main
[1;36m(APIServer pid=1735408)[0;0m     args.dispatch_function(args)
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py", line 57, in cmd
[1;36m(APIServer pid=1735408)[0;0m     uvloop.run(run_server(args))
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/uvloop/__init__.py", line 69, in run
[1;36m(APIServer pid=1735408)[0;0m     return loop.run_until_complete(wrapper())
[1;36m(APIServer pid=1735408)[0;0m   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/uvloop/__init__.py", line 48, in wrapper
[1;36m(APIServer pid=1735408)[0;0m     return await main
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
[1;36m(APIServer pid=1735408)[0;0m     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
[1;36m(APIServer pid=1735408)[0;0m     async with build_async_engine_client(
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/contextlib.py", line 199, in __aenter__
[1;36m(APIServer pid=1735408)[0;0m     return await anext(self.gen)
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
[1;36m(APIServer pid=1735408)[0;0m     async with build_async_engine_client_from_engine_args(
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/contextlib.py", line 199, in __aenter__
[1;36m(APIServer pid=1735408)[0;0m     return await anext(self.gen)
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args
[1;36m(APIServer pid=1735408)[0;0m     async_llm = AsyncLLM.from_vllm_config(
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/utils/__init__.py", line 1572, in inner
[1;36m(APIServer pid=1735408)[0;0m     return fn(*args, **kwargs)
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
[1;36m(APIServer pid=1735408)[0;0m     return cls(
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 134, in __init__
[1;36m(APIServer pid=1735408)[0;0m     self.engine_core = EngineCoreClient.make_async_mp_client(
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
[1;36m(APIServer pid=1735408)[0;0m     return AsyncMPClient(*client_args)
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 769, in __init__
[1;36m(APIServer pid=1735408)[0;0m     super().__init__(
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 448, in __init__
[1;36m(APIServer pid=1735408)[0;0m     with launch_core_engines(vllm_config, executor_class,
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/contextlib.py", line 142, in __exit__
[1;36m(APIServer pid=1735408)[0;0m     next(self.gen)
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
[1;36m(APIServer pid=1735408)[0;0m     wait_for_engine_startup(
[1;36m(APIServer pid=1735408)[0;0m   File "/opt/conda/envs/minicpmo/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
[1;36m(APIServer pid=1735408)[0;0m     raise RuntimeError("Engine core initialization failed. "
[1;36m(APIServer pid=1735408)[0;0m RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[NOTICE] The application is pending for GPU resource in asynchronous queue. The longest waiting time in queue is 1800 seconds.
/opt/conda/envs/minicpmo/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '


### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Qwen3 Omini AttributeError: 'Qwen3OmniMoeProcessor' object has no attribute '_get_num_multimodal_tokens' #27932

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Qwen3 Omini AttributeError: 'Qwen3OmniMoeProcessor' object has no attribute '_get_num_multimodal_tokens' #27932

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions