[Bugfix] Fix transformers 5.x compat issues in online TTS serving by linyueqian · Pull Request #1536 · vllm-project/vllm-omni

linyueqian · 2026-02-27T06:12:12Z

Summary

Remove fix_mistral_regex=True from AutoTokenizer.from_pretrained (parameter removed in transformers 5.x)
Add fallback for 'default' rope_type missing from ROPE_INIT_FUNCTIONS in transformers 5.x (inline standard sinusoidal RoPE)
Clamp num_cached_tokens to max(0, ...) in OmniGenerationScheduler to prevent negative value crash

These fixes are required for online TTS serving to work with the current environment (transformers 5.2.0, pinned via uv.lock).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 339b3ddb2b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/model_executor/models/qwen3_tts/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py

hsliuustc0106

Summary

This PR fixes three compatibility issues with transformers 5.x that were breaking online TTS serving. The changes are minimal, focused, and address real breaking changes in the transformers library.

Pros:

Addresses actual breaking changes in transformers 5.x
Small, focused fixes (21 additions, 4 deletions)
Good inline documentation explaining the 'default' rope_type fallback
Defensive programming with the max(0, ...) clamp
Clear error message for unsupported rope types

Cons:

No test coverage for the new fallback logic
The num_cached_tokens negative value issue suggests a deeper problem upstream

Recommendation: Approve with suggestions for follow-up investigation.

vllm_omni/model_executor/models/qwen3_tts/qwen3_tts_talker.py

hsliuustc0106 · 2026-02-27T09:06:50Z

vllm_omni/model_executor/models/qwen3_tts/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py

+            def _default_rope_init(config, device=None, seq_len=None, layer_type=None):
+                head_dim = getattr(config, "head_dim", config.hidden_size // config.num_attention_heads)
+                inv_freq = 1.0 / (
+                    config.rope_theta ** (torch.arange(0, head_dim, 2, dtype=torch.float32, device=device) / head_dim)


Good: Well-documented fallback

The inline implementation of 'default' RoPE is well-documented and correct. The comment clearly explains why this is needed (transformers 5.x removed 'default' from ROPE_INIT_FUNCTIONS).

Suggestion: Consider adding a reference to the transformers version where this changed:

# transformers>=5.0 removed 'default' from ROPE_INIT_FUNCTIONS (see transformers PR #xxxxx)

hsliuustc0106 · 2026-02-27T09:06:50Z

vllm_omni/model_executor/models/qwen3_tts/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py

+                f"Unsupported rope_type '{self.rope_type}'. Expected one of {list(ROPE_INIT_FUNCTIONS)} or 'default'."
+            )

        inv_freq, self.attention_scaling = self.rope_init_fn(self.config, device)


Good: Clear error message

The error message provides helpful context about what rope types are supported. This will make debugging easier if an unsupported type is encountered.

hsliuustc0106 · 2026-02-27T09:06:50Z

vllm_omni/core/sched/omni_generation_scheduler.py

                        events=request.take_events(),
                        kv_transfer_params=kv_transfer_params,
                        trace_headers=request.trace_headers,
-                        num_cached_tokens=request.num_cached_tokens,


Issue: Symptom fix, not root cause

Clamping num_cached_tokens to max(0, ...) prevents the crash, but it's treating the symptom rather than the root cause. A negative num_cached_tokens suggests:

There's a bug upstream where request.num_cached_tokens is being set to a negative value

Or there's a logic error in how cached tokens are being counted

Recommendation:

Add a warning log when clamping occurs to help track down the root cause:

num_cached = request.num_cached_tokens if num_cached < 0: logger.warning(f"Negative num_cached_tokens ({num_cached}) detected for request {request.request_id}, clamping to 0") num_cached = 0 num_cached_tokens=num_cached,

File a follow-up issue to investigate why num_cached_tokens can be negative

This defensive fix is fine for now, but understanding the root cause would prevent potential issues elsewhere.

hsliuustc0106 · 2026-02-27T09:06:50Z

vllm_omni/core/sched/omni_generation_scheduler.py

+                        num_cached_tokens=max(0, request.num_cached_tokens),
                        num_external_computed_tokens=request.num_external_computed_tokens,
                        routed_experts=routed_experts,
                        num_nans_in_logits=request.num_nans_in_logits,


Same issue here

Same recommendation as above - consider adding logging to track when this clamping occurs.

lishunyang12 · 2026-02-28T03:48:51Z

vllm_omni/model_executor/models/qwen3_tts/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py

+                    config.rope_theta ** (torch.arange(0, head_dim, 2, dtype=torch.float32, device=device) / head_dim)
+                )
+                return inv_freq, 1.0
+


Nit: _default_rope_init doesn't close over anything — pull it out to module level so you're not creating a new function object per instance.

lishunyang12 · 2026-02-28T03:49:02Z

vllm_omni/core/sched/omni_generation_scheduler.py

                        events=request.take_events(),
                        kv_transfer_params=kv_transfer_params,
                        trace_headers=request.trace_headers,
-                        num_cached_tokens=request.num_cached_tokens,


+1 to adding a logger.warning when clamping fires. Silent clamps on negative values will mask whatever upstream bug is producing them.

lishunyang12

Left a couple minor comments. The fixes look correct overall.

Signed-off-by: linyueqian <linyueqian@outlook.com>

…known types Signed-off-by: linyueqian <linyueqian@outlook.com>

…hed_tokens Signed-off-by: linyueqian <linyueqian@outlook.com>

linyueqian · 2026-02-28T04:07:32Z

@hsliuustc0106 check this again?

linyueqian requested a review from hsliuustc0106 as a code owner February 27, 2026 06:12

chatgpt-codex-connector bot reviewed Feb 27, 2026

View reviewed changes

vllm_omni/model_executor/models/qwen3_tts/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py Outdated Show resolved Hide resolved

hsliuustc0106 reviewed Feb 27, 2026

View reviewed changes

lishunyang12 reviewed Feb 28, 2026

View reviewed changes

linyueqian added 3 commits February 27, 2026 23:05

[Bugfix] Fix transformers 5.x compat issues in online TTS serving

2e2e351

Signed-off-by: linyueqian <linyueqian@outlook.com>

Address review: restrict rope_type fallback to 'default', raise on un…

35f71ef

…known types Signed-off-by: linyueqian <linyueqian@outlook.com>

Address review comments: module-level rope init, log negative num_cac…

d911ac2

…hed_tokens Signed-off-by: linyueqian <linyueqian@outlook.com>

linyueqian force-pushed the bugfix/tts-transformers5-compat branch from 50daf92 to d911ac2 Compare February 28, 2026 04:06

Merge branch 'main' into bugfix/tts-transformers5-compat

16c7c83

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix transformers 5.x compat issues in online TTS serving#1536

[Bugfix] Fix transformers 5.x compat issues in online TTS serving#1536
linyueqian wants to merge 4 commits intovllm-project:mainfrom
linyueqian:bugfix/tts-transformers5-compat

linyueqian commented Feb 27, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

hsliuustc0106 left a comment

Uh oh!

Uh oh!

hsliuustc0106 Feb 27, 2026

Uh oh!

hsliuustc0106 Feb 27, 2026

Uh oh!

hsliuustc0106 Feb 27, 2026

Uh oh!

hsliuustc0106 Feb 27, 2026

Uh oh!

lishunyang12 Feb 28, 2026

Uh oh!

lishunyang12 Feb 28, 2026

Uh oh!

lishunyang12 left a comment

Uh oh!

linyueqian commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

linyueqian commented Feb 27, 2026

Summary

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Summary

Uh oh!

Uh oh!

hsliuustc0106 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

linyueqian commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants