Skip to content

VITA-MLLM/VITA-QinYu

Repository files navigation

VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing

✨ Highlights

VITA-QinYu, the first end-to-end spoken language model that supports role-playing and singing via a hybrid text–speech modeling framework and a large-scale data synthesis pipeline, while achieving state-of-the-art performance in natural conversational speech.

  • Expressive Speech Generation. VITA-QinYu supports singing and role-playing capabilities within a unified end-to-end model.
  • Hybrid Speech–Text Modeling. VITA-QinYu adopts an interleaved speech–text modeling paradigm with parallel multi-codebook audio tokens to enable richer paralinguistic representation.
  • Large-Scale Synthetic Data. VITA-QinYu employs a comprehensive pipeline to generate large-scale, high-quality expressive speech data for training.
  • Strong Performance. VITA-QinYu achieves state-of-the-art conversational accuracy while improving expressive speech generation.
  • Open-Source Deployment. VITA-QinYu provides open-source models, training code, and a streaming full-duplex web demonstration.

🔥 RoadMap

  • 2026.04.x 🌟 We release VITA-QinYu with model weights, inference & training code and web demo.

🔔 Models

Model LLM Size Huggingface Weights
VITA-QinYu-8B 8B https://huggingface.co/VITA-MLLM/VITA-QinYu-8B
VITA-QinYu-4B 4B https://huggingface.co/VITA-MLLM/VITA-QinYu-4B

Getting Started

Prepare Environment

docker pull mikexu/vita-qinyu:base

Get the Code

git clone https://github.com/VITA-MLLM/VITA-QinYu.git
cd VITA-QinYu
git submodule update --init --recursive
pip install -r requirements.txt
pip install -e .

Download the required models

mkdir /vita-qinyu-models

# VITA-QinYu
hf download VITA-MLLM/VITA-QinYu-Models --local-dir /vita-qinyu-models/VITA-QinYu-Models
hf download VITA-MLLM/VITA-QinYu-4B --local-dir /vita-qinyu-models/VITA-QinYu-4B
hf download VITA-MLLM/VITA-QinYu-8B --local-dir /vita-qinyu-models/VITA-QinYu-8B

# FunAudioLLM/SenseVoiceSmall
hf download FunAudioLLM/SenseVoiceSmall --local-dir /vita-qinyu-models/FunAudioLLM/SenseVoiceSmall

# openai/whisper-large-v3
hf download openai/whisper-large-v3 --local-dir /vita-qinyu-models/openai/whisper-large-v3

# TEN-framework/TEN_Turn_Detection
hf download TEN-framework/TEN_Turn_Detection --local-dir /vita-qinyu-models/TEN-framework/TEN_Turn_Detection

Inference

Offline Inference

We provide a simple inference script that covers speech-to-speech, ASR and TTS examples.

CUDA_VISIBLE_DEVICES=0 python tools/inference_sts.py --model /vita-qinyu-models/VITA-QinYu-8B --output_dir /output
CUDA_VISIBLE_DEVICES=0 python tools/inference_sts.py --model /vita-qinyu-models/VITA-QinYu-4B --output_dir /output

Online Inference

# Natural
CUDA_VISIBLE_DEVICES=0 python web_demo_stream.py --port 8080 --model /vita-qinyu-models/VITA-QinYu-4B
# RolePlay
CUDA_VISIBLE_DEVICES=0 python web_demo_stream.py --port 8080 --model /vita-qinyu-models/VITA-QinYu-4B --mode roleplay --role_description "该角色是一个幼儿女性,身份是世家千金,性格活泼机敏、爱撒娇,气质天真灵动,音色甜润,语速较快"
  • --mode Natural conversation when mode=default; role-playing when mode=roleplay.
  • --model the path of VITA-QinYu model
  • --role_description describe role information when mode=roleplay
  • --port the port of web demo default:8080

Then you can visit localhost:8080

Finetune

You can finetune your own model.

download toysample

hf download VITA-MLLM/VITA-QinYu-ToySample --local-dir /vita-qinyu-models/ToySample

scripts

bash scripts/deepspeed/vita_qinyu_utu/finetune.sh /vita-qinyu-models/ToySample/toy_sample.yaml /vita-qinyu-models/ToySample /vita-qinyu-models
bash scripts/deepspeed/vita_qinyu_qwen3/finetune.sh /vita-qinyu-models/ToySample/toy_sample.yaml /vita-qinyu-models/ToySample /vita-qinyu-models 

Discussion

Discuss on Github Issues.

Scan the QR code to join our official QQ chat group.

Statement

VITA-QinYu is trained on large-scale open-source corpus, and its output has randomness. Any content generated by VITA-QinYu does not represent the views of the model developers. We are not responsible for any problems arising from the use, misuse, and dissemination of VITA-QinYu, including but not limited to public opinion risks and data security issues.

Acknowledgements

Citation

Coming soon...

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors