[NPU][Bugfix] Align GPU side and recover qwen3-tts#1564
[NPU][Bugfix] Align GPU side and recover qwen3-tts#1564hsliuustc0106 merged 2 commits intovllm-project:mainfrom
Conversation
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8be2f6e3b7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # FIXME(gcanlin): Refactor build_attn_metadata to avoid special-casing NPU backends here. | ||
| if current_omni_platform.is_npu(): | ||
| # NPU requires AscendCommonAttentionMetadata with extra attributes | ||
| from vllm_ascend.worker.v2 import attn_utils |
There was a problem hiding this comment.
Avoid shadowing
attn_utils in build_attn_metadata
Importing attn_utils inside the NPU-only branch makes attn_utils a local variable for the whole function. When current_omni_platform.is_npu() is false (e.g., GPU/CPU execution), the else branch calls attn_utils.build_attn_metadata(...) before that local is assigned, which raises UnboundLocalError and breaks the Qwen3-TTS code predictor path on non-NPU backends.
Useful? React with 👍 / 👎.
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
cc @hsliuustc0106 @Gaohan123 PTAL, thanks! |
lishunyang12
left a comment
There was a problem hiding this comment.
Left a couple of minor comments. The TTS/Omni unification in the model runner looks solid overall — the getattr-based dispatch and the cudagraph guard make sense.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
FIX #1508.
As a workaround, we have to add a hardware-specific code in modeling file. I will remove it quickly in the next release.
This PR also updates the docs to prepare the coming release v0.16.0.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)