Skip to content

Commit 7157f8f

Browse files
grorge123hydai
authored andcommitted
[Example] ChatTTS: add advanced option
1 parent a29dedc commit 7157f8f

File tree

9 files changed

+72
-87
lines changed

9 files changed

+72
-87
lines changed

wasmedge-chatTTS/.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
asset
2+
config
3+
*.wav

wasmedge-chatTTS/README.md

Lines changed: 58 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,60 @@
1+
# ChatTTS example with WasmEdge WASI-NN ChatTTS plugin
2+
This example demonstrates how to use the WasmEdge WASI-NN ChatTTS plugin to generate speech from text. ChatTTS is a text-to-speech model designed specifically for dialogue scenarios such as LLM assistant. This example will use the WasmEdge WASI-NN ChatTTS plugin to run the ChatTTS to generate speech.
3+
4+
## Install WasmEdge with WASI-NN ChatTTS plugin
5+
The ChatTTS backend relies on ChatTTS and Python library, we recommend the following commands to install the dependencies.
6+
``` bash
7+
sudo apt update
8+
sudo apt upgrade
9+
sudo apt install python3-dev
10+
pip install chattts==0.1.1
11+
```
12+
13+
Then build and install WasmEdge from source.
14+
15+
``` bash
16+
cd <path/to/your/wasmedge/source/folder>
17+
18+
cmake -GNinja -Bbuild -DCMAKE_BUILD_TYPE=Release -DWASMEDGE_PLUGIN_WASI_NN_BACKEND="chatTTS"
19+
cmake --build build
20+
21+
# For the WASI-NN plugin, you should install this project.
22+
cmake --install build
23+
```
24+
25+
Then you will have an executable `wasmedge` runtime under `/usr/local/bin` and the WASI-NN with Neural Speed backend plug-in under `/usr/local/lib/wasmedge/libwasmedgePluginWasiNN.so` after installation.
26+
27+
## Build wasm
28+
29+
Run the following command to build wasm, the output WASM file will be at `target/wasm32-wasi/release/`
30+
31+
```bash
132
cargo build --target wasm32-wasi --release
33+
```
34+
35+
## Execute
36+
37+
Execute the WASM with the `wasmedge`.
38+
39+
``` bash
40+
wasmedge --dir .:. ./target/wasm32-wasi/release/wasmedge-chattts.wasm
41+
```
42+
43+
Then you will generate the `output1.wav` file. It is the wav file of the input text.
44+
45+
## Advanced Options
46+
47+
The `config_data` is used to adjust the configuration of the ChatTTS.
48+
Supports the following options:
49+
- `prompt`: Generate the special token in the text to synthesize.
50+
- `spk_emb`: Sampled speaker (Using `random` for random speaker).
51+
- `temperature`: Custom temperature.
52+
- `top_k`: Top P decode.
53+
- `top_p`: Top K decode.
254

3-
wasmedge --dir .:. ./target/wasm32-wasi/release/wasmedge-chattts.wasm
55+
``` rust
56+
let config_data = serde_json::to_string(&json!({"prompt": "[oral_2][laugh_0][break_6]", "spk_emb": "random", "temperature": 0.5, "top_k": 0, "top_p": 0.9}))
57+
.unwrap()
58+
.as_bytes()
59+
.to_vec();
60+
```

wasmedge-chatTTS/config/decoder.yaml

Lines changed: 0 additions & 12 deletions
This file was deleted.

wasmedge-chatTTS/config/dvae.yaml

Lines changed: 0 additions & 14 deletions
This file was deleted.

wasmedge-chatTTS/config/gpt.yaml

Lines changed: 0 additions & 20 deletions
This file was deleted.

wasmedge-chatTTS/config/path.yaml

Lines changed: 0 additions & 11 deletions
This file was deleted.

wasmedge-chatTTS/config/vocos.yaml

Lines changed: 0 additions & 24 deletions
This file was deleted.

wasmedge-chatTTS/src/main.rs

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
1+
use hound;
2+
use serde_json::json;
13
use wasmedge_wasi_nn::{
2-
self, ExecutionTarget, GraphBuilder, GraphEncoding, GraphExecutionContext,
3-
TensorType,
4+
self, ExecutionTarget, GraphBuilder, GraphEncoding, GraphExecutionContext, TensorType,
45
};
5-
use hound;
66

77
fn get_data_from_context(context: &GraphExecutionContext, index: usize, limit: usize) -> Vec<u8> {
8-
// Preserve for 4096 tokens with average token length 8
98
const MAX_OUTPUT_BUFFER_SIZE: usize = 4096 * 4096;
109
let mut output_buffer = vec![0u8; MAX_OUTPUT_BUFFER_SIZE];
1110
let _ = context
@@ -16,8 +15,12 @@ fn get_data_from_context(context: &GraphExecutionContext, index: usize, limit: u
1615
}
1716

1817
fn main() {
19-
let prompt = "It is a test sentence.";
18+
let prompt = "It is [uv_break] test sentence [laugh] for chat T T S";
2019
let tensor_data = prompt.as_bytes().to_vec();
20+
let config_data = serde_json::to_string(&json!({"prompt": "[oral_2][laugh_0][break_6]", "spk_emb": "random", "temperature": 0.5, "top_k": 0, "top_p": 0.9}))
21+
.unwrap()
22+
.as_bytes()
23+
.to_vec();
2124
let empty_vec: Vec<Vec<u8>> = Vec::new();
2225
let graph = GraphBuilder::new(GraphEncoding::ChatTTS, ExecutionTarget::CPU)
2326
.build_from_bytes(empty_vec)
@@ -28,6 +31,9 @@ fn main() {
2831
context
2932
.set_input(0, TensorType::U8, &[1], &tensor_data)
3033
.expect("Failed to set input");
34+
context
35+
.set_input(1, TensorType::U8, &[1], &config_data)
36+
.expect("Failed to set input");
3137
context.compute().expect("Failed to compute");
3238
let bytes_written = get_data_from_context(&context, 1, 4);
3339
let bytes_written = usize::from_le_bytes(bytes_written.as_slice().try_into().unwrap());
1.69 MB
Binary file not shown.

0 commit comments

Comments
 (0)