BreezyVoiceX

Based on BreezyVoice by MediaTek Labs.

What is BreezyVoiceX?

A zero-shot voice cloning TTS system for Taiwanese-accented Mandarin. Give it a short audio clip of any speaker, and it generates natural speech in that voice — with phonetic control via 注音 (bopomofo).

BreezyVoiceX wraps MediaTek's BreezyVoice with a streamlined two-step workflow (cache speaker → synthesize), Windows support, and performance profiling. No Linux-only dependencies required.

What's Different from BreezyVoice

Fast zero-shot voice synthesis via prompt caching
Built-in time profiler for each major inference step
Fully runnable without Linux-only ttsfrd dependency

Install

Python 3.11 is required. CUDA 12.1 recommended for GPU users.

Clone the repo

git clone https://github.com/Docat0209/BreezyVoiceX.git
cd BreezyVoiceX

Linux

pip install -r requirements.txt

Windows

pip install -r requirements.txt
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install WeTextProcessing --no-deps

Inference

UTF8 encoding is required:

export PYTHONUTF8=1

This version separates the process into two explicit steps

Run single_inference.py with the following arguments:

`--mode cache`（Generate speaker prompt cache）

Argument	Description
`--speaker_prompt_audio_path`	Required. Path to the speaker reference audio.
`--speaker_prompt_text_transcription`	Optional. Manual transcription. If not provided, Whisper will be used.
`--prompt_feature_path`	Optional. Output cache file path. Default: `cache/prompt.pt`.
`--model_path`	Optional. HF model ID or directory. Default: `MediaTek-Research/BreezyVoice-300M`.

`--mode synthesize`（Generate Audio）

Argument	Description
`--content_to_synthesize`	Required. The target text for TTS.
`--prompt_feature_path`	Required. Path to previously saved speaker cache (`.pt`).
`--output_path`	Optional. Output WAV file path. Default: `results/output.wav`.
`--model_path`	Optional. HF model ID or directory. Default: `MediaTek-Research/BreezyVoice-300M`.

Example Usage:

Step 1: Cache Speaker Prompt

python single_inference.py --mode cache --speaker_prompt_audio_path data/example.wav --prompt_feature_path cache/example.pt

Step 2: Synthesize Voice from Text

python single_inference.py --mode synthesize --content_to_synthesize "您好，這是一段生成測試語音。" --prompt_feature_path cache/example.pt --output_path results/output.wav

Credits & Acknowledgement

This project is based on BreezyVoice by MediaTek Labs,
a voice-cloning TTS system tailored for Taiwanese Mandarin with phonetic control via 注音 (bopomofo).
The original project was derived in part from CosyVoice, and is part of the Breeze2 model family.

We appreciate the efforts of the original authors, and this repository continues that work by providing deployment-ready infrastructure, Windows compatibility, and modular serving enhancements.

For official demo, model, and paper, please refer to:

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
cache		cache
cosyvoice		cosyvoice
data		data
images		images
results		results
third_party/Matcha-TTS		third_party/Matcha-TTS
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.zh-TW.md		README.zh-TW.md
batch_inference.py		batch_inference.py
requirements-windows.txt		requirements-windows.txt
requirements.txt		requirements.txt
run_batch_inference.sh		run_batch_inference.sh
run_single_inference.sh		run_single_inference.sh
single_inference.py		single_inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BreezyVoiceX

What is BreezyVoiceX?

What's Different from BreezyVoice

Install

Clone the repo

Linux

Windows

Inference

`--mode cache`（Generate speaker prompt cache）

`--mode synthesize`（Generate Audio）

Step 1: Cache Speaker Prompt

Step 2: Synthesize Voice from Text

Credits & Acknowledgement

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BreezyVoiceX

What is BreezyVoiceX?

What's Different from BreezyVoice

Install

Clone the repo

Linux

Windows

Inference

--mode cache（Generate speaker prompt cache）

--mode synthesize（Generate Audio）

Step 1: Cache Speaker Prompt

Step 2: Synthesize Voice from Text

Credits & Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`--mode cache`（Generate speaker prompt cache）

`--mode synthesize`（Generate Audio）

Packages