Skip to content

[Bug]: qwen3-omni online server for the same video (use_audio_in_video)ย #1476

@kevin236-max

Description

@kevin236-max

Your current environment

[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409] Failed on request chatcmpl-af3280326f70e3de:                                                                                             
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409] Traceback (most recent call last):                                                                                                       
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py",
 line 554, in generate                                                                                                                                                                       
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     q = await self.add_request(                                                                                                          
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]         ^^^^^^^^^^^^^^^^^^^^^^^                                                                                                          
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py",
 line 355, in add_request                                                                                                                                                                    
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     request = self.input_processor.process_inputs(                                                                                       
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                       
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../vllm-omni/vllm_omni/engine/input_processor.py", line 180, in process_inputs         
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     processed_inputs: ProcessorInputs = self.input_preprocessor.preprocess(                                                              
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                              
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/inputs/preprocess.py", l
ine 691, in preprocess                                                                                                                                                                       
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     res = self._preprocess(prompt, tokenization_kwargs, mm_uuids=mm_uuids)                                                               
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                               
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/inputs/preprocess.py", l
ine 677, in _preprocess                                                                                                                                                                      
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     return self._process_decoder_only_prompt(                                                                                            
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                            
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/inputs/preprocess.py", l
ine 646, in _process_decoder_only_prompt                                                                                                                                                     
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     prompt_comps = self._prompt_to_llm_inputs(                                                                                           
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                           
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../vllm-omni/vllm_omni/inputs/preprocess.py", line 127, in _prompt_to_llm_inputs       
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     return self._process_tokens(                                                                                                         
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]            ^^^^^^^^^^^^^^^^^^^^^                                                                                                         
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../vllm-omni/vllm_omni/inputs/preprocess.py", line 81, in _process_tokens              
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     inputs = self._process_multimodal(                                                                                                   
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]              ^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                   
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/inputs/preprocess.py", l
ine 261, in _process_multimodal                                                                                                                                                              
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     mm_input = mm_processor.apply(                                                                                                       
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]                ^^^^^^^^^^^^^^^^^^^                                                                                                       
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/multimodal/processing/pr
ocessor.py", line 1844, in apply                                                                                                                                                             
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     ) = self._cached_apply_hf_processor(                                                                                                 
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                 
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/multimodal/processing/pr
ocessor.py", line 1621, in _cached_apply_hf_processor     
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     ) = self._apply_hf_processor_main(
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_5_omni_thinker.py", line 694, in _apply_hf_processor_main
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     mm_processed_data = self._apply_hf_processor_mm_only(
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_5_omni_thinker.py", line 715, in _apply_hf_processor_mm_only
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     assert "audio" in mm_counts
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]            ^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409] AssertionError
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409] 
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409] The above exception was the direct cause of the following exception:
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409] 
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409] Traceback (most recent call last):
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../vllm-omni/vllm_omni/entrypoints/omni_stage.py", line 1402, in generation_single_request
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     async for res in cast(AsyncLLM, stage_engine).generate(ein, llm_sampling_params, rid):
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 625, in generate
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     raise EngineGenerateError() from e
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409] vllm.v1.engine.exceptions.EngineGenerateError
(APIServer pid=115189) ERROR 02-25 19:37:34 [async_omni.py:492] [AsyncOrchestrator] Stage 0 error on request chatcmpl-af3280326f70e3de: 
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725] Chat completion failed: {'request_id': 'chatcmpl-af3280326f70e3de', 'stage_id': 0, 'error': ''}
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725] Traceback (most recent call last):
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]   File ".../vllm-omni/vllm_omni/entrypoints/openai/api_server.py", line 723, in create_chat_completion
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]     generator = await handler.create_chat_completion(request, raw_request)
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]   File ".../vllm-omni/vllm_omni/entrypoints/openai/serving_chat.py", line 317, in create_chat_completion
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]     return await self.chat_completion_full_generator(
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]   File ".../vllm-omni/vllm_omni/entrypoints/openai/serving_chat.py", line 1331, in chat_completion_full_generator
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]     async for res in result_generator:
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]   File ".../vllm-omni/vllm_omni/entrypoints/async_omni.py", line 336, in generate
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]     async for output in self._process_sequential_results(
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]   File ".../vllm-omni/vllm_omni/entrypoints/async_omni.py", line 425, in _process_sequential_results
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]     engine_outputs, finished, output_to_yield = self._process_single_result(
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]   File ".../vllm-omni/vllm_omni/entrypoints/async_omni.py", line 495, in _process_single_result
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]     raise RuntimeError(result)
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725] RuntimeError: {'request_id': 'chatcmpl-af3280326f70e3de', 'stage_id': 0, 'error': ''}
(APIServer pid=115189) INFO:     127.0.0.1:43720 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
async_chunk: false  # add
stage_args:
  - stage_id: 0
    stage_type: llm  # Use llm stage type to launch OmniLLM
    runtime:
      devices: "2,3"
      max_batch_size: 16
      batch_timeout: 0.05
      batch_poll_interval: 0.002
    engine_args:
      model_stage: thinker
      model_arch: Qwen3OmniMoeForConditionalGeneration
      worker_type: ar
      scheduler_cls: vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler
      gpu_memory_utilization: 0.9
      enforce_eager: true
      trust_remote_code: true
      # engine_output_type: latent  # Output hidden states for talker
      engine_output_type: text
      distributed_executor_backend: "mp"
      enable_prefix_caching: true
      max_num_batched_tokens: 32768
      hf_config_name: thinker_config
      tensor_parallel_size: 2
      mm_processor_cache_gb: 1
    final_output: true
    final_output_type: text
    is_comprehension: true
    default_sampling_params:
      temperature: 0.4
      top_p: 0.9
      top_k: 1
      max_tokens: 2048
      seed: 42
      detokenize: True
      repetition_penalty: 1.05

Your code version

vllm 0.15.0

๐Ÿ› Describe the bug

None

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions