support vLLM >=0.11.0 (V1 engine) for better performance #1640

Jzz1943 · 2025-11-10T08:33:46Z

Support running CosyVoice2 inference with vLLM 0.11.0(V1 engine only) for better performance.

Under the same conditions, compared with vLLM 0.9.0 (V0 engine), the first-chunk latency for inference with vLLM 0.11.0 (V1 engine) is reduced by approximately 15+ ms. Additionally, the first-chunk latency is more stable, with much smaller fluctuations than the V0 engine.

support vLLM >=0.11.0 (V1 engine only)

6816fc6

Jzz1943 changed the title ~~support vLLM >=0.11.0 (V1 engine only)~~ support vLLM >=0.11.0 (V1 engine) for better performance Nov 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support vLLM >=0.11.0 (V1 engine) for better performance #1640

support vLLM >=0.11.0 (V1 engine) for better performance #1640

Jzz1943 commented Nov 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

support vLLM >=0.11.0 (V1 engine) for better performance #1640

Are you sure you want to change the base?

support vLLM >=0.11.0 (V1 engine) for better performance #1640

Conversation

Jzz1943 commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Jzz1943 commented Nov 10, 2025 •

edited

Loading