[Bug]: qwen3-omni online server for the same video (use_audio_in_video)

### Your current environment

```text
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409] Failed on request chatcmpl-af3280326f70e3de:                                                                                             
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409] Traceback (most recent call last):                                                                                                       
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py",
 line 554, in generate                                                                                                                                                                       
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     q = await self.add_request(                                                                                                          
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]         ^^^^^^^^^^^^^^^^^^^^^^^                                                                                                          
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py",
 line 355, in add_request                                                                                                                                                                    
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     request = self.input_processor.process_inputs(                                                                                       
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                       
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../vllm-omni/vllm_omni/engine/input_processor.py", line 180, in process_inputs         
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     processed_inputs: ProcessorInputs = self.input_preprocessor.preprocess(                                                              
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                              
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/inputs/preprocess.py", l
ine 691, in preprocess                                                                                                                                                                       
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     res = self._preprocess(prompt, tokenization_kwargs, mm_uuids=mm_uuids)                                                               
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                               
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/inputs/preprocess.py", l
ine 677, in _preprocess                                                                                                                                                                      
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     return self._process_decoder_only_prompt(                                                                                            
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                            
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/inputs/preprocess.py", l
ine 646, in _process_decoder_only_prompt                                                                                                                                                     
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     prompt_comps = self._prompt_to_llm_inputs(                                                                                           
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                           
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../vllm-omni/vllm_omni/inputs/preprocess.py", line 127, in _prompt_to_llm_inputs       
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     return self._process_tokens(                                                                                                         
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]            ^^^^^^^^^^^^^^^^^^^^^                                                                                                         
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../vllm-omni/vllm_omni/inputs/preprocess.py", line 81, in _process_tokens              
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     inputs = self._process_multimodal(                                                                                                   
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]              ^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                   
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/inputs/preprocess.py", l
ine 261, in _process_multimodal                                                                                                                                                              
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     mm_input = mm_processor.apply(                                                                                                       
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]                ^^^^^^^^^^^^^^^^^^^                                                                                                       
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/multimodal/processing/pr
ocessor.py", line 1844, in apply                                                                                                                                                             
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     ) = self._cached_apply_hf_processor(                                                                                                 
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                 
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/multimodal/processing/pr
ocessor.py", line 1621, in _cached_apply_hf_processor     
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     ) = self._apply_hf_processor_main(
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_5_omni_thinker.py", line 694, in _apply_hf_processor_main
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     mm_processed_data = self._apply_hf_processor_mm_only(
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_5_omni_thinker.py", line 715, in _apply_hf_processor_mm_only
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     assert "audio" in mm_counts
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]            ^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409] AssertionError
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409] 
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409] The above exception was the direct cause of the following exception:
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409] 
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409] Traceback (most recent call last):
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../vllm-omni/vllm_omni/entrypoints/omni_stage.py", line 1402, in generation_single_request
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     async for res in cast(AsyncLLM, stage_engine).generate(ein, llm_sampling_params, rid):
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]   File ".../miniconda3/envs/omniagent/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 625, in generate
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409]     raise EngineGenerateError() from e
[Stage-0] ERROR 02-25 19:37:34 [omni_stage.py:1409] vllm.v1.engine.exceptions.EngineGenerateError
(APIServer pid=115189) ERROR 02-25 19:37:34 [async_omni.py:492] [AsyncOrchestrator] Stage 0 error on request chatcmpl-af3280326f70e3de: 
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725] Chat completion failed: {'request_id': 'chatcmpl-af3280326f70e3de', 'stage_id': 0, 'error': ''}
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725] Traceback (most recent call last):
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]   File ".../vllm-omni/vllm_omni/entrypoints/openai/api_server.py", line 723, in create_chat_completion
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]     generator = await handler.create_chat_completion(request, raw_request)
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]   File ".../vllm-omni/vllm_omni/entrypoints/openai/serving_chat.py", line 317, in create_chat_completion
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]     return await self.chat_completion_full_generator(
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]   File ".../vllm-omni/vllm_omni/entrypoints/openai/serving_chat.py", line 1331, in chat_completion_full_generator
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]     async for res in result_generator:
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]   File ".../vllm-omni/vllm_omni/entrypoints/async_omni.py", line 336, in generate
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]     async for output in self._process_sequential_results(
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]   File ".../vllm-omni/vllm_omni/entrypoints/async_omni.py", line 425, in _process_sequential_results
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]     engine_outputs, finished, output_to_yield = self._process_single_result(
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]   File ".../vllm-omni/vllm_omni/entrypoints/async_omni.py", line 495, in _process_single_result
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725]     raise RuntimeError(result)
(APIServer pid=115189) ERROR 02-25 19:37:34 [api_server.py:725] RuntimeError: {'request_id': 'chatcmpl-af3280326f70e3de', 'stage_id': 0, 'error': ''}
(APIServer pid=115189) INFO:     127.0.0.1:43720 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
```


```yaml
async_chunk: false  # add
stage_args:
  - stage_id: 0
    stage_type: llm  # Use llm stage type to launch OmniLLM
    runtime:
      devices: "2,3"
      max_batch_size: 16
      batch_timeout: 0.05
      batch_poll_interval: 0.002
    engine_args:
      model_stage: thinker
      model_arch: Qwen3OmniMoeForConditionalGeneration
      worker_type: ar
      scheduler_cls: vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler
      gpu_memory_utilization: 0.9
      enforce_eager: true
      trust_remote_code: true
      # engine_output_type: latent  # Output hidden states for talker
      engine_output_type: text
      distributed_executor_backend: "mp"
      enable_prefix_caching: true
      max_num_batched_tokens: 32768
      hf_config_name: thinker_config
      tensor_parallel_size: 2
      mm_processor_cache_gb: 1
    final_output: true
    final_output_type: text
    is_comprehension: true
    default_sampling_params:
      temperature: 0.4
      top_p: 0.9
      top_k: 1
      max_tokens: 2048
      seed: 42
      detokenize: True
      repetition_penalty: 1.05
```

### Your code version

vllm 0.15.0

### 🐛 Describe the bug

None

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://vllm-omni.readthedocs.io), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: qwen3-omni online server for the same video (use_audio_in_video) #1476

Your current environment

Your code version

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: qwen3-omni online server for the same video (use_audio_in_video) #1476

Description

Your current environment

Your code version

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions