Skip to content

Conversation

@Jzz1943
Copy link

@Jzz1943 Jzz1943 commented Nov 10, 2025

Support running CosyVoice2 inference with vLLM 0.11.0(V1 engine only) for better performance.
image
Under the same conditions, compared with vLLM 0.9.0 (V0 engine), the first-chunk latency for inference with vLLM 0.11.0 (V1 engine) is reduced by approximately 15+ ms. Additionally, the first-chunk latency is more stable, with much smaller fluctuations than the V0 engine.

@Jzz1943 Jzz1943 changed the title support vLLM >=0.11.0 (V1 engine only) support vLLM >=0.11.0 (V1 engine) for better performance Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant