[Bugfix] Fix transformers 5.x compat issues in online TTS serving#1536
[Bugfix] Fix transformers 5.x compat issues in online TTS serving#1536linyueqian wants to merge 4 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 339b3ddb2b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
vllm_omni/model_executor/models/qwen3_tts/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py
Outdated
Show resolved
Hide resolved
hsliuustc0106
left a comment
There was a problem hiding this comment.
Summary
This PR fixes three compatibility issues with transformers 5.x that were breaking online TTS serving. The changes are minimal, focused, and address real breaking changes in the transformers library.
Pros:
- Addresses actual breaking changes in transformers 5.x
- Small, focused fixes (21 additions, 4 deletions)
- Good inline documentation explaining the 'default' rope_type fallback
- Defensive programming with the
max(0, ...)clamp - Clear error message for unsupported rope types
Cons:
- No test coverage for the new fallback logic
- The
num_cached_tokensnegative value issue suggests a deeper problem upstream
Recommendation: Approve with suggestions for follow-up investigation.
| def _default_rope_init(config, device=None, seq_len=None, layer_type=None): | ||
| head_dim = getattr(config, "head_dim", config.hidden_size // config.num_attention_heads) | ||
| inv_freq = 1.0 / ( | ||
| config.rope_theta ** (torch.arange(0, head_dim, 2, dtype=torch.float32, device=device) / head_dim) |
There was a problem hiding this comment.
Good: Well-documented fallback
The inline implementation of 'default' RoPE is well-documented and correct. The comment clearly explains why this is needed (transformers 5.x removed 'default' from ROPE_INIT_FUNCTIONS).
Suggestion: Consider adding a reference to the transformers version where this changed:
# transformers>=5.0 removed 'default' from ROPE_INIT_FUNCTIONS (see transformers PR #xxxxx)| f"Unsupported rope_type '{self.rope_type}'. Expected one of {list(ROPE_INIT_FUNCTIONS)} or 'default'." | ||
| ) | ||
|
|
||
| inv_freq, self.attention_scaling = self.rope_init_fn(self.config, device) |
There was a problem hiding this comment.
Good: Clear error message
The error message provides helpful context about what rope types are supported. This will make debugging easier if an unsupported type is encountered.
| events=request.take_events(), | ||
| kv_transfer_params=kv_transfer_params, | ||
| trace_headers=request.trace_headers, | ||
| num_cached_tokens=request.num_cached_tokens, |
There was a problem hiding this comment.
Issue: Symptom fix, not root cause
Clamping num_cached_tokens to max(0, ...) prevents the crash, but it's treating the symptom rather than the root cause. A negative num_cached_tokens suggests:
- There's a bug upstream where
request.num_cached_tokensis being set to a negative value - Or there's a logic error in how cached tokens are being counted
Recommendation:
- Add a warning log when clamping occurs to help track down the root cause:
num_cached = request.num_cached_tokens
if num_cached < 0:
logger.warning(f"Negative num_cached_tokens ({num_cached}) detected for request {request.request_id}, clamping to 0")
num_cached = 0
num_cached_tokens=num_cached,- File a follow-up issue to investigate why
num_cached_tokenscan be negative
This defensive fix is fine for now, but understanding the root cause would prevent potential issues elsewhere.
| num_cached_tokens=max(0, request.num_cached_tokens), | ||
| num_external_computed_tokens=request.num_external_computed_tokens, | ||
| routed_experts=routed_experts, | ||
| num_nans_in_logits=request.num_nans_in_logits, |
There was a problem hiding this comment.
Same issue here
Same recommendation as above - consider adding logging to track when this clamping occurs.
| config.rope_theta ** (torch.arange(0, head_dim, 2, dtype=torch.float32, device=device) / head_dim) | ||
| ) | ||
| return inv_freq, 1.0 | ||
|
|
There was a problem hiding this comment.
Nit: _default_rope_init doesn't close over anything — pull it out to module level so you're not creating a new function object per instance.
| events=request.take_events(), | ||
| kv_transfer_params=kv_transfer_params, | ||
| trace_headers=request.trace_headers, | ||
| num_cached_tokens=request.num_cached_tokens, |
There was a problem hiding this comment.
+1 to adding a logger.warning when clamping fires. Silent clamps on negative values will mask whatever upstream bug is producing them.
lishunyang12
left a comment
There was a problem hiding this comment.
Left a couple minor comments. The fixes look correct overall.
Signed-off-by: linyueqian <linyueqian@outlook.com>
…known types Signed-off-by: linyueqian <linyueqian@outlook.com>
…hed_tokens Signed-off-by: linyueqian <linyueqian@outlook.com>
50daf92 to
d911ac2
Compare
|
@hsliuustc0106 check this again? |
Summary
fix_mistral_regex=TruefromAutoTokenizer.from_pretrained(parameter removed in transformers 5.x)'default'rope_type missing fromROPE_INIT_FUNCTIONSin transformers 5.x (inline standard sinusoidal RoPE)num_cached_tokenstomax(0, ...)inOmniGenerationSchedulerto prevent negative value crashThese fixes are required for online TTS serving to work with the current environment (transformers 5.2.0, pinned via
uv.lock).