Name	Name	Last commit message	Last commit date
parent directory ..
export	export
tokenizer	tokenizer
LICENSE	LICENSE
LICENSE_DISTILL	LICENSE_DISTILL
README.md	README.md
example_ailia_voice.py	example_ailia_voice.py
gpt-sovits-v2-pro.py	gpt-sovits-v2-pro.py
reference_audio_captured_by_ax.wav	reference_audio_captured_by_ax.wav
requirements.txt	requirements.txt

Name

Last commit message

Last commit date

example_ailia_voice.py

gpt-sovits-v2-pro.py

reference_audio_captured_by_ax.wav

requirements.txt

GPT-SoVITS V2 Pro

Input

A synthesis text and reference audio and reference text for voice cloning

Output

The Voice file is output as .wav which path is defined as SAVE_WAV_PATH in gpt-sovits-v2-pro.py .

Requirements

This model requires pyopenjtalk for g2p.

pip3 install -r requirements.txt

Usage

Automatically downloads the onnx and prototxt files on the first run. It is necessary to be connected to the Internet while downloading.

For the sample sentence and sample audio,

python3 gpt-sovits-v2-pro.py

Run with audio prompt.

python3 gpt-sovits-v2-pro.py -i "ax株式会社ではAIの実用化のための技術を開発しています。" --ref_audio reference_audio_captured_by_ax.wav --ref_text "水をマレーシアから買わなくてはならない。"

Run for english.

python3 gpt-sovits-v2-pro.py -i "Hello world. We are testing speech synthesis." --text_language en --ref_audio reference_audio_captured_by_ax.wav --ref_text "水をマレーシアから買わなくてはならない。" --ref_language ja

Architecture

GPT-SoVITS V2 Pro uses the following ONNX models:

Model	Description	Input	Output
cnhubert.onnx	Chinese HuBERT for SSL features	ref_audio_16k (1, N)	ssl_content (1, T, 768)
t2s_encoder.onnx	T2S encoder	ref_seq, text_seq, ref_bert, text_bert, ssl_content	x, prompts
t2s_fsdec.onnx	T2S first-stage decoder	x, prompts, top_k, top_p, temperature, repetition_penalty	y, k, v, y_emb, x_example
t2s_sdec.onnx	T2S stage decoder	iy, ik, iv, iy_emb, ix_example, top_k, top_p, temperature, repetition_penalty	y, k, v, y_emb, logits, samples
sv.onnx	Speaker Verification (ERes2NetV2)	fbank_feat (1, T, 80)	sv_emb (1, 20480)
vits.onnx	VITS synthesizer with v2Pro weights	text_seq, pred_semantic, ref_audio, sv_emb	audio

The v2Pro architecture differs from v2 by:

Speaker verification embedding (sv_emb) from ERes2NetV2 model
gin_channels=1024 (vs 512 in v2), with ge_to512 projection for MRTE
PReLU activation on combined reference + speaker embeddings

ONNX Export

To export the VITS model:

git clone -b 20250606v2pro https://github.com/RVC-Boss/GPT-SoVITS.git
cd GPT-SoVITS
pip install -r requirements.txt

# Export VITS
python3 /path/to/export/export_vits.py \
    --sovits_path GPT_SoVITS/pretrained_models/v2Pro/s2Gv2Pro.pth \
    --output vits.onnx

# Export SV model
python3 /path/to/export/export_sv.py \
    --sv_path GPT_SoVITS/pretrained_models/sv/pretrained_eres2netv2w24s4ep4.ckpt \
    --output sv.onnx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

GPT-SoVITS V2 Pro

Input

Output

Requirements

Usage

Architecture

ONNX Export

Reference

Framework

Model Format

Netron

Normal model

FilesExpand file tree

gpt-sovits-v2-pro

Directory actions

More options

Directory actions

More options

Latest commit

History

gpt-sovits-v2-pro

Folders and files

parent directory

README.md

GPT-SoVITS V2 Pro

Input

Output

Requirements

Usage

Architecture

ONNX Export

Reference

Framework

Model Format

Netron

Normal model