Skip to content

Commit 6e5083c

Browse files
committed
docs(readme): update instructions for two-step CLI and Windows support
1 parent d5f6eb2 commit 6e5083c

File tree

1 file changed

+72
-2
lines changed

1 file changed

+72
-2
lines changed

README.md

Lines changed: 72 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,79 @@
11
# BreezyVoiceX
22

33
> Based on [BreezyVoice](https://github.com/mtkresearch/BreezyVoice) by MediaTek Labs.
4-
> This repository will be updated gradually with deployment optimizations and feature extensions.
54
6-
Documentation and acknowledgements will be completed after core functionality is refactored.
5+
BreezyVoiceX is an enhanced version of MediaTek [BreezyVoice](https://github.com/mtkresearch/BreezyVoice), focused on usability.
6+
7+
## Key Improvements
8+
- Fast zero-shot voice synthesis via prompt caching
9+
- Built-in time profiler for each major inference step
10+
- Fully runnable without Linux-only ttsfrd dependency
11+
12+
## Install
13+
14+
> Python 3.11 is required. CUDA 12.1 recommended for GPU users.
15+
16+
### Clone the repo
17+
```bash
18+
git clone https://github.com/Docat0209/BreezyVoiceX.git
19+
cd BreezyVoiceX
20+
```
21+
22+
### Linux
23+
```bash
24+
pip install -r requirements.txt
25+
```
26+
27+
### Windows
28+
```bash
29+
pip install -r requirements.txt
30+
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121
31+
pip install WeTextProcessing --no-deps
32+
```
33+
34+
## Inference
35+
36+
UTF8 encoding is required:
37+
38+
``` sh
39+
export PYTHONUTF8=1
40+
```
41+
42+
---
43+
> This version separates the process into two explicit steps
44+
45+
**Run single_inference.py with the following arguments:**
46+
47+
### `--mode cache`(Generate speaker prompt cache)
48+
| Argument | Description |
49+
| ------------------------------------- | ---------------------------------------------------------------------------------- |
50+
| `--speaker_prompt_audio_path` | Required. Path to the speaker reference audio. |
51+
| `--speaker_prompt_text_transcription` | Optional. Manual transcription. If not provided, Whisper will be used. |
52+
| `--prompt_feature_path` | Optional. Output cache file path. Default: `cache/prompt.pt`. |
53+
| `--model_path` | Optional. HF model ID or directory. Default: `MediaTek-Research/BreezyVoice-300M`. |
54+
55+
56+
### `--mode synthesize`(Generate Audio)
57+
58+
| Argument | Description |
59+
|----------|-------------|
60+
| `--content_to_synthesize` | Required. The target text for TTS. |
61+
| `--prompt_feature_path` | Required. Path to previously saved speaker cache (`.pt`). |
62+
| `--output_path` | Optional. Output WAV file path. Default: `results/output.wav`. |
63+
| `--model_path` | Optional. HF model ID or directory. Default: `MediaTek-Research/BreezyVoice-300M`. |
64+
65+
**Example Usage:**
66+
67+
### Step 1: Cache Speaker Prompt
68+
```bash
69+
python single_inference.py --mode cache --speaker_prompt_audio_path data/example.wav --prompt_feature_path cache/example.pt
70+
```
71+
72+
### Step 2: Synthesize Voice from Text
73+
```bash
74+
python single_inference.py --mode synthesize --content_to_synthesize "您好,這是一段生成測試語音。" --prompt_feature_path cache/example.pt --output_path results/output.wav
75+
```
76+
777

878
## Credits & Acknowledgement
979

0 commit comments

Comments
 (0)