Skip to content

Commit 0b87a2d

Browse files
committed
examples : add README.md to tts example [no ci]
1 parent 8eceb88 commit 0b87a2d

File tree

1 file changed

+71
-0
lines changed

1 file changed

+71
-0
lines changed

examples/tts/README.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
## Text To Speach (TTS) example
2+
This example demonstrates the Text To Speach feature. It uses a
3+
[model](https://www.outeai.com/blog/outetts-0.2-500m) from
4+
[outeai](https://www.outeai.com/).
5+
6+
### Model conversion
7+
Checkout or download the model that contains the LLM model:
8+
```console
9+
$ pushd models
10+
$ git clone --branch main --single-branch --depth 1 https://huggingface.co/OuteAI/OuteTTS-0.2-500M
11+
$ cd OuteTTS-0.2-500M && git lfs install && git lfs pull
12+
$ popd
13+
```
14+
Convert the model to .gguf format:
15+
```console
16+
(venv) python convert_hf_to_gguf.py models/OuteTTS-0.2-500M \
17+
--outfile models/outetts-0.2-0.5B-f16.gguf --outtype f16
18+
```
19+
The generated model will be `models/outetts-0.2-0.5B-f16.gguf`.
20+
21+
We can optionally quantize this to Q8_0 using the following command:
22+
```console
23+
$ build/bin/llama-quantize models/outetts-0.2-0.5B-f16.gguf \
24+
models/outetts-0.2-0.5B-q8_0.gguf q8_0
25+
```
26+
The quantized model will be `models/outetts-0.2-0.5B-q8_0.gguf`.
27+
28+
Next we do something simlar for the audio decoder. First download or checkout
29+
the model for the voice decoder:
30+
```console
31+
$ pushd models
32+
$ git clone --branch main --single-branch --depth 1 https://huggingface.co/novateur/WavTokenizer-large-speech-75token
33+
$ cd WavTokenizer-large-speech-75token && git lfs install && git lfs pull
34+
$ popd
35+
```
36+
This model file is PyTorch checkpoint (.ckpt) and we first need to convert it to
37+
huggingface format:
38+
```console
39+
(venv) python examples/tts/convert_pt_to_hf.py \
40+
models/WavTokenizer-large-speech-75token/wavtokenizer_large_speech_320_24k.ckpt
41+
...
42+
Model has been successfully converted and saved to models/WavTokenizer-large-speech-75token/model.safetensors
43+
Metadata has been saved to models/WavTokenizer-large-speech-75token/index.json
44+
Config has been saved to models/WavTokenizer-large-speech-75tokenconfig.json
45+
```
46+
Then we can convert the huggingface format to gguf:
47+
```console
48+
(venv) python convert_hf_to_gguf.py models/WavTokenizer-large-speech-75token \
49+
--outfile models/wavtokenizer-large-75-f16.gguf --outtype f16
50+
...
51+
INFO:hf-to-gguf:Model successfully exported to models/wavtokenizer-large-75-f16.gguf
52+
```
53+
54+
### Running the example
55+
56+
With both of the models generated, the LLM model and the voice decoder model,
57+
we can run the example:
58+
```console
59+
$ build/bin/llama-tts -m ./models/outetts-0.2-0.5B-q8_0.gguf \
60+
-mv ./models/wavtokenizer-large-75-f16.gguf \
61+
-p "Hello world"
62+
...
63+
main: audio written to file 'output.wav'
64+
```
65+
The output.wav file will contain the audio of the prompt. This can be heard
66+
by playing the file with a media player. On Linux the following command will
67+
play the audio:
68+
```console
69+
$ aplay output.wav
70+
```
71+

0 commit comments

Comments
 (0)