[Bugfix] fix kernel error for qwen3-omni#1602
[Bugfix] fix kernel error for qwen3-omni#1602hsliuustc0106 merged 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e71b4753fe
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
hsliuustc0106
left a comment
There was a problem hiding this comment.
is this the root cause for this problem or just a workround?
| runtime: | ||
| devices: "1" | ||
| max_batch_size: 64 | ||
| max_batch_size: 32 |
There was a problem hiding this comment.
why we need to change config?
There was a problem hiding this comment.
we only use two cards now, batch size = 64 for code2wav will OOM for qwen3-omni convolution computation
Signed-off-by: Rein Yang <ruiruyang2@gmail.com>
root cause |
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
When vLLM groups requests into a batch, it builds sampling metadata where prompt_token_ids is a tensor with shape [num_reqs, max_prompt_len]. When request lengths are shorter than max_prompt_len, they are padded to the batch's maximum number of prompt tokens using the model's vocab_size. For multi-stage models, each stage has a different vocab_size. In Qwen3-Omni, the talker incorrectly uses the thinker's vocab_size during the sampling phase, causing an out-of-bounds computation error. I clamped the padding value of prompt_token_ids to match the correct vocab size for each stage.
This can help solve #1520 & #1532
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)