[Model] Fun cosy voice3-0.5-b-2512#498
[Model] Fun cosy voice3-0.5-b-2512#498divyanshsinghvi wants to merge 119 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
…ress Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
…n also fixed Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
…/vllm-omni into Fun-CosyVoice3-0.5B-2512
examples/offline_inference/text_to_speech/verify_e2e_cosyvoice.py
Outdated
Show resolved
Hide resolved
examples/offline_inference/text_to_speech/verify_e2e_cosyvoice.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/cosyvoice3/cosyvoice3_talker.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@lishunyang12 I have few comments to address as it got recently reviewed. Should be ready to merge in next few days. |
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
|
@linyueqian All the comments are addressed. cc: @hsliuustc0106 |
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
|
Great work! A few things worth tracking as follow-ups:
|
|
|
Could you share a quick benchmark with just these metrics?
Please report for:
Same prompt/audio setup for both is enough. |
The ones in the description (#498 (comment)) don't suffice (Check the Performance Benchmarks: section)? |
|
Thanks, I saw the E2E and stage-time benchmarks in the PR description.\nCould you also share TTFA and RTF (same setup) for completeness? |
Will update. |
Purpose
Resolves #315
This PR integrates the CosyVoice3 text-to-speech model into vllm-omni, implementing both the "Talker" (LLM) and "Code2Wav" (Flow Matching + HiFiGAN) stages. It includes critical
architectural enhancements to ensure stability and correctness within the vLLM execution engine.
Model Implementation
Current Limitations:
good first issuewith minimal changes required.Test Plan
python examples/offline_inference/text_to_speech/verify_e2e_cosyvoice.py --model pretrained_models/Fun-CosyVoice3-0.5B --tokenizer pretrained_models/Fun-CosyVoice3-0.5B/CosyVoice-BlankENTest Result
Input: prompt.wav
Output: output_0.wav
Performance Benchmarks:
Run: NVIDIA GeForce RTX 3070
CUDA Version : 13.0
Driver Version : 580.95.05
Stats [latest to earlier]:
After fixing code to allow enforce_eager=False
Integration of vllm Qwen2Model for Stage 0 .
Stage 0: Memory Spiked integrating vllm implementation
from vllm.model_executor.models.qwen2 import Qwen2Modelcompared to usingfrom transformers import Qwen2ForCausalLM. Unsure Why?No effect on runtime.
Before integration of vllm Qwen2Model
For E2E time metrics, memory profiling was off.
Memory :
Memory Profiling:
Stage 0 :
Stage 1:

Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)