[Bugfix] fix kernel error for qwen3-omni#1602
[Bugfix] fix kernel error for qwen3-omni#1602hsliuustc0106 merged 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e71b4753fe
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
hsliuustc0106
left a comment
There was a problem hiding this comment.
is this the root cause for this problem or just a workround?
| runtime: | ||
| devices: "1" | ||
| max_batch_size: 64 | ||
| max_batch_size: 32 |
There was a problem hiding this comment.
why we need to change config?
There was a problem hiding this comment.
we only use two cards now, batch size = 64 for code2wav will OOM for qwen3-omni convolution computation
Signed-off-by: Rein Yang <ruiruyang2@gmail.com>
root cause |
Signed-off-by: Rein Yang <ruiruyang2@gmail.com> Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: Rein Yang <ruiruyang2@gmail.com> Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: Rein Yang <ruiruyang2@gmail.com>
|
I can still not run this benchmark on 2* H100, my test command is
env is error is |
Hi, I tested again, I didn't meet OOM problem in latest main with 2*H100. Could you please check if any other residual processes are running in your environment? |
### vllm-omni-perf - Source: [PR #1619](vllm-project/vllm-omni#1619) - [Bugfix] Fix Qwen3-TTS code predictor crash due to missing vLLM config context - Changes: - Bug fix: [Bugfix] Fix Qwen3-TTS code predictor crash due to missing vLLM config context ### vllm-omni-contrib - Source: [PR #1615](vllm-project/vllm-omni#1615) - [Doc] Fix links in the configuration doc - Changes: - Bug fix: [Doc] Fix links in the configuration doc ### vllm-omni-perf - Source: [PR #1609](vllm-project/vllm-omni#1609) - [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation - Changes: - Bug fix: [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation ### vllm-omni-image-gen - Source: [PR #1609](vllm-project/vllm-omni#1609) - [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation - Changes: - Bug fix: [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation - Additions: - GLM-Image - GLM-Image - GLM-Image - GLM-Image - GLM-Image - GLM-Image - GLM-Image - GLM-Image ### vllm-omni-api - Source: [PR #1609](vllm-project/vllm-omni#1609) - [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation - Changes: - Bug fix: [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation ### vllm-omni-serving - Source: [PR #1602](vllm-project/vllm-omni#1602) - [Bugfix] fix kernel error for qwen3-omni - Changes: - Bug fix: [Bugfix] fix kernel error for qwen3-omni ### vllm-omni-perf - Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Changes: - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0 ### vllm-omni-image-gen - Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Changes: - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Additions: - HunyuanImage3 - HunyuanImage3Pipeline - HunyuanImage3 - HunyuanImage-3 - HunyuanImage-3 - HunyuanImage-3 - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage-3 ### vllm-omni-quantization - Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Changes: - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0 ### vllm-omni-distributed - Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Changes: - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0 ### vllm-omni-contrib - Source: [PR #1576](vllm-project/vllm-omni#1576) - 0.16.0 release ### vllm-omni-audio-tts - Source: [PR #1570](vllm-project/vllm-omni#1570) - [bugfix] Fix unexpected argument 'is_finished' in function llm2code2wav_async_chunk of mimo-audio - Changes: - Bug fix: [bugfix] Fix unexpected argument 'is_finished' in function llm2code2wav_async_chunk of mimo-audio ### vllm-omni-api - Source: [PR #1566](vllm-project/vllm-omni#1566) - [Bugfix] Import InputPreprocessor into Renderer - Changes: - Bug fix: [Bugfix] Import InputPreprocessor into Renderer ### vllm-omni-perf - Source: [PR #1565](vllm-project/vllm-omni#1565) - [BugFix]: fix a lot of bug - Changes: - Bug fix: [BugFix]: fix a lot of bug ### vllm-omni-contrib - Source: [PR #1564](vllm-project/vllm-omni#1564) - [NPU][Bugfix] Align GPU side and recover qwen3-tts - Changes: - Bug fix: [NPU][Bugfix] Align GPU side and recover qwen3-tts ### vllm-omni-audio-tts - Source: [PR #1564](vllm-project/vllm-omni#1564) - [NPU][Bugfix] Align GPU side and recover qwen3-tts - Changes: - Bug fix: [NPU][Bugfix] Align GPU side and recover qwen3-tts ### vllm-omni-perf - Source: [PR #1562](vllm-project/vllm-omni#1562) - [BugFix] Fix unexpected crash when init OmniDiffusion - Changes: - Bug fix: [BugFix] Fix unexpected crash when init OmniDiffusion ### vllm-omni-api - Source: [PR #1562](vllm-project/vllm-omni#1562) - [BugFix] Fix unexpected crash when init OmniDiffusion - Changes: - Bug fix: [BugFix] Fix unexpected crash when init OmniDiffusion ### vllm-omni-quantization - Source: [PR #1562](vllm-project/vllm-omni#1562) - [BugFix] Fix unexpected crash when init OmniDiffusion - Changes: - Bug fix: [BugFix] Fix unexpected crash when init OmniDiffusion ### vllm-omni-distributed - Source: [PR #1562](vllm-project/vllm-omni#1562) - [BugFix] Fix unexpected crash when init OmniDiffusion - Changes: - Bug fix: [BugFix] Fix unexpected crash when init OmniDiffusion ### vllm-omni-api - Source: [PR #1554](vllm-project/vllm-omni#1554) - fix(qwen3-tts): fix Base ICL voice clone producing corrupted audio - Changes: - Bug fix: fix(qwen3-tts): fix Base ICL voice clone producing corrupted audio ### vllm-omni-cicd - Source: [PR #1543](vllm-project/vllm-omni#1543) - [CI] Modify some CI test cases to run on L4 environment to reduce H100 resource usage. ### vllm-omni-perf - Source: [PR #1540](vllm-project/vllm-omni#1540) - Fix no embed text spk tokens - Changes: - Bug fix: Fix no embed text spk tokens ### vllm-omni-distributed - Source: [PR #1540](vllm-project/vllm-omni#1540) - Fix no embed text spk tokens - Changes: - Bug fix: Fix no embed text spk tokens ### vllm-omni-perf - Source: [PR #1539](vllm-project/vllm-omni#1539) - [Debug] Enable curl retry aligned with openai ### vllm-omni-quantization - Source: [PR #1539](vllm-project/vllm-omni#1539) - [Debug] Enable curl retry aligned with openai ### vllm-omni-distributed - Source: [PR #1539](vllm-project/vllm-omni#1539) - [Debug] Enable curl retry aligned with openai ### vllm-omni-image-gen - Source: [PR #1538](vllm-project/vllm-omni#1538) - [CI][skip ci]Update H100 image link based on #1518 ### vllm-omni-perf - Source: [PR #1536](vllm-project/vllm-omni#1536) - [Bugfix] Fix transformers 5.x compat issues in online TTS serving - Changes: - Bug fix: [Bugfix] Fix transformers 5.x compat issues in online TTS serving ### vllm-omni-serving - Source: [PR #1536](vllm-project/vllm-omni#1536) - [Bugfix] Fix transformers 5.x compat issues in online TTS serving - Changes: - Bug fix: [Bugfix] Fix transformers 5.x compat issues in online TTS serving ### vllm-omni-cicd - Source: [PR #1534](vllm-project/vllm-omni#1534) - [Debug] Merge vllm pull 35368 ### vllm-omni-contrib - Source: [PR #1530](vllm-project/vllm-omni#1530) - [Docs] update async chunk docs diagram [skip ci] ### vllm-omni-distributed - Source: [PR #1524](vllm-project/vllm-omni#1524) - [BugFix] Restore talker's config - Changes: - Bug fix: [BugFix] Restore talker's config ### vllm-omni-api - Source: [PR #1522](vllm-project/vllm-omni#1522) - [Bugfix] Use uds for zmq address if not set --stage-id - Changes: - New feature: [Bugfix] Use uds for zmq address if not set --stage-id ### vllm-omni-perf - Source: [PR #1521](vllm-project/vllm-omni#1521) - Revert gpu_1 job to use regular image ### vllm-omni-perf - Source: [PR #1518](vllm-project/vllm-omni#1518) - Use pull through cache image for H100 pool ### vllm-omni-perf - Source: [PR #1515](vllm-project/vllm-omni#1515) - [Bugfix] fix offline text_to_image error from #1009 - Changes: - Bug fix: [Bugfix] fix offline text_to_image error from #1009 ### vllm-omni-image-gen - Source: [PR #1515](vllm-project/vllm-omni#1515) - [Bugfix] fix offline text_to_image error from #1009 - Changes: - Bug fix: [Bugfix] fix offline text_to_image error from #1009 - Additions: - num-images-per-prompt ### vllm-omni-quantization - Source: [PR #1515](vllm-project/vllm-omni#1515) - [Bugfix] fix offline text_to_image error from #1009 - Changes: - Bug fix: [Bugfix] fix offline text_to_image error from #1009 ### vllm-omni-distributed - Source: [PR #1515](vllm-project/vllm-omni#1515) - [Bugfix] fix offline text_to_image error from #1009 - Changes: - Bug fix: [Bugfix] fix offline text_to_image error from #1009 ### vllm-omni-api - Source: [PR #1509](vllm-project/vllm-omni#1509) - [Chore] remove unused logger in omni_diffusion (#531) ### vllm-omni-perf - Source: [PR #1505](vllm-project/vllm-omni#1505) - [Doc] Update installation instructions for vllm 0.16.0 ### vllm-omni-quantization - Source: [PR #1505](vllm-project/vllm-omni#1505) - [Doc] Update installation instructions for vllm 0.16.0 ### vllm-omni-distributed - Source: [PR #1505](vllm-project/vllm-omni#1505) - [Doc] Update installation instructions for vllm 0.16.0 ### vllm-omni-contrib - Source: [PR #1505](vllm-project/vllm-omni#1505) - [Doc] Update installation instructions for vllm 0.16.0 ### vllm-omni-video-gen - Source: [PR #1504](vllm-project/vllm-omni#1504) - [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading - Changes: - New feature: [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading ### vllm-omni-perf - Source: [PR #1504](vllm-project/vllm-omni#1504) - [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading - Changes: - New feature: [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading ### vllm-omni-api - Source: [PR #1504](vllm-project/vllm-omni#1504) - [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading - Changes: - New feature: [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading ### vllm-omni-cicd - Source: [PR #1504](vllm-project/vllm-omni#1504) - [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading - Changes: - New feature: [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading ### vllm-omni-contrib - Source: [PR #1500](vllm-project/vllm-omni#1500) - [ROCm] [CI] [Docker] Point to use the latest vLLM v0.16.0 stable version ### vllm-omni-cicd - Source: [PR #1492](vllm-project/vllm-omni#1492) - [Platform] Enable layerwise offload on all hardware ### vllm-omni-image-gen - Source: [PR #1491](vllm-project/vllm-omni#1491) - [CI] Update Dockerfile for vllm-omni CI image and remove obsolete dep… ### vllm-omni-cicd - Source: [PR #1488](vllm-project/vllm-omni#1488) - [XPU][NPU][ROCM] enable cpu_offloading flag for non_cuda ### vllm-omni-audio-tts - Source: [PR #1482](vllm-project/vllm-omni#1482) - [Fix][Chore] Qwen3-TTS Modeling Minor Code Sanity Improvements - Changes: - Bug fix: [Fix][Chore] Qwen3-TTS Modeling Minor Code Sanity Improvements ### vllm-omni-perf - Source: [PR #1468](vllm-project/vllm-omni#1468) - [BugFix] process request.num_cached_tokens if it equals to the initial value - Changes: - Bug fix: [BugFix] process request.num_cached_tokens if it equals to the initial value ### vllm-omni-audio-tts - Source: [PR #1455](vllm-project/vllm-omni#1455) - [Bugfix] Fix case-sensitive task_type matching in Qwen3TTSModelForGeneration - Changes: - Bug fix: [Bugfix] Fix case-sensitive task_type matching in Qwen3TTSModelForGeneration ### vllm-omni-cicd - Source: [PR #1449](vllm-project/vllm-omni#1449) - [Test] Reduce Perf test case and fix modify stage config - Changes: - Bug fix: [Test] Reduce Perf test case and fix modify stage config ### vllm-omni-cicd - Source: [PR #1448](vllm-project/vllm-omni#1448) - [Bugfix] Race condition in MultiprocExecutor when concurent access to Scheduler - Changes: - Bug fix: [Bugfix] Race condition in MultiprocExecutor when concurent access to Scheduler ### vllm-omni-cicd - Source: [PR #1438](vllm-project/vllm-omni#1438) - [Qwen3TTS][Feat] Streaming output - Changes: - New feature: [Qwen3TTS][Feat] Streaming output ### vllm-omni-api - Source: [PR #1438](vllm-project/vllm-omni#1438) - [Qwen3TTS][Feat] Streaming output - Changes: - New feature: [Qwen3TTS][Feat] Streaming output ### vllm-omni-contrib - Source: [PR #1438](vllm-project/vllm-omni#1438) - [Qwen3TTS][Feat] Streaming output - Changes: - New feature: [Qwen3TTS][Feat] Streaming output ### vllm-omni-audio-tts - Source: [PR #1438](vllm-project/vllm-omni#1438) - [Qwen3TTS][Feat] Streaming output - Changes: - New feature: [Qwen3TTS][Feat] Streaming output ### vllm-omni-cicd - Source: [PR #1435](vllm-project/vllm-omni#1435) - [Doc][Test][Misc] ComfyUI test, more screenshot, and code cleaning ### vllm-omni-video-gen - Source: [PR #1433](vllm-project/vllm-omni#1433) - [Debug] Multi-Request for Qwen 3 Omni use_audio_in_video ### vllm-omni-audio-tts - Source: [PR #1433](vllm-project/vllm-omni#1433) - [Debug] Multi-Request for Qwen 3 Omni use_audio_in_video
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
When vLLM groups requests into a batch, it builds sampling metadata where prompt_token_ids is a tensor with shape [num_reqs, max_prompt_len]. When request lengths are shorter than max_prompt_len, they are padded to the batch's maximum number of prompt tokens using the model's vocab_size. For multi-stage models, each stage has a different vocab_size. In Qwen3-Omni, the talker incorrectly uses the thinker's vocab_size during the sampling phase, causing an out-of-bounds computation error. I clamped the padding value of prompt_token_ids to match the correct vocab size for each stage.
This can help solve #1520 & #1532
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)