|
| 1 | +# Text to speech example with WasmEdge WASI-NN Piper plugin |
| 2 | + |
| 3 | +This example demonstrates how to use WasmEdge WASI-NN Piper plugin to perform TTS. |
| 4 | + |
| 5 | +## Build WasmEdge with WASI-NN Piper plugin |
| 6 | + |
| 7 | +Overview of WASI-NN Piper plugin dependencies: |
| 8 | + |
| 9 | + |
| 10 | + |
| 11 | +- [piper](https://github.com/rhasspy/piper): A fast, local neural text to speech system. |
| 12 | +- [piper-phonemize](https://github.com/rhasspy/piper-phonemize): C++ library for converting text to phonemes for Piper. |
| 13 | +- [espeak-ng](https://github.com/rhasspy/espeak-ng): An open source speech synthesizer that supports more than hundred languages and accents. Piper uses it for text to phoneme translation. |
| 14 | +- [onnxruntime](https://github.com/microsoft/onnxruntime): A cross-platform inference and training machine-learning accelerator. [ONNX](https://onnx.ai/) is an open format built to represent machine learning models. Piper uses ONNX Runtime as an inference backend for its ONNX models to convert phoneme ids to WAV audio. |
| 15 | + |
| 16 | +The WasmEdge WASI-NN Piper plugin relies on the ONNX Runtime C++ API. For installation instructions, please refer to the installation table on the [official website](https://onnxruntime.ai/getting-started). |
| 17 | + |
| 18 | +Example of installing ONNX Runtime 1.14.1 on Ubuntu: |
| 19 | + |
| 20 | +```bash |
| 21 | +curl -LO https://github.com/microsoft/onnxruntime/releases/download/v1.14.1/onnxruntime-linux-x64-1.14.1.tgz |
| 22 | +tar zxf onnxruntime-linux-x64-1.14.1.tgz |
| 23 | +mv onnxruntime-linux-x64-1.14.1/include/* /usr/local/include/ |
| 24 | +mv onnxruntime-linux-x64-1.14.1/lib/* /usr/local/lib/ |
| 25 | +rm -rf onnxruntime-linux-x64-1.14.1.tgz onnxruntime-linux-x64-1.14.1 |
| 26 | +ldconfig |
| 27 | +``` |
| 28 | + |
| 29 | +For other dependencies, WasmEdge will download and build them automatically. |
| 30 | + |
| 31 | +Build WasmEdge from source: |
| 32 | + |
| 33 | +```bash |
| 34 | +cd /path/to/wasmedge/source/folder |
| 35 | + |
| 36 | +cmake -GNinja -Bbuild -DCMAKE_BUILD_TYPE=Release -DWASMEDGE_USE_LLVM=OFF -DWASMEDGE_PLUGIN_WASI_NN_BACKEND=Piper |
| 37 | +cmake --build build |
| 38 | +``` |
| 39 | + |
| 40 | +Then you will have an executable `wasmedge` runtime at `build/tools/wasmedge/wasmedge` and the WASI-NN with Piper backend plug-in at `build/plugins/wasi_nn/libwasmedgePluginWasiNN.so`. |
| 41 | + |
| 42 | +## Model Download Link |
| 43 | + |
| 44 | +In this example, we will use the [en_US-lessac-medium](https://huggingface.co/rhasspy/piper-voices/tree/main/en/en_US/lessac/medium) model. |
| 45 | + |
| 46 | +[MODEL CARD](https://huggingface.co/rhasspy/piper-voices/blob/main/en/en_US/lessac/medium/MODEL_CARD): |
| 47 | + |
| 48 | +``` |
| 49 | +# Model card for lessac (medium) |
| 50 | +
|
| 51 | +* Language: en_US (English, United States) |
| 52 | +* Speakers: 1 |
| 53 | +* Quality: medium |
| 54 | +* Samplerate: 22,050Hz |
| 55 | +
|
| 56 | +## Dataset |
| 57 | +
|
| 58 | +* URL: https://www.cstr.ed.ac.uk/projects/blizzard/2013/lessac_blizzard2013/ |
| 59 | +* License: https://www.cstr.ed.ac.uk/projects/blizzard/2013/lessac_blizzard2013/license.html |
| 60 | +
|
| 61 | +## Training |
| 62 | +
|
| 63 | +Trained from scratch. |
| 64 | +
|
| 65 | +``` |
| 66 | + |
| 67 | +It has a model file [en_US-lessac-medium.onnx](https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx) and a config file [en_US-lessac-medium.onnx.json](https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json). |
| 68 | + |
| 69 | +```bash |
| 70 | +# Download model |
| 71 | +curl -LO https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx |
| 72 | +# Download config |
| 73 | +curl -LO https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json |
| 74 | +``` |
| 75 | + |
| 76 | +This model uses [eSpeak NG](https://github.com/rhasspy/espeak-ng) to convert text to phonemes, so we also need to download the required espeak-ng-data. |
| 77 | + |
| 78 | +This will download and extract the espeak-ng-data directory to the current working directory: |
| 79 | + |
| 80 | +```bash |
| 81 | +curl -LO https://github.com/rhasspy/piper/releases/download/2023.11.14-2/piper_linux_x86_64.tar.gz |
| 82 | +tar -xzf piper_linux_x86_64.tar.gz piper/espeak-ng-data --strip-components=1 |
| 83 | +``` |
| 84 | + |
| 85 | +## Build wasm |
| 86 | + |
| 87 | +Run the following command to build wasm, the output WASM file will be at `target/wasm32-wasi/release/` |
| 88 | + |
| 89 | +```bash |
| 90 | +cargo build --target wasm32-wasi --release |
| 91 | +``` |
| 92 | + |
| 93 | +## Execute |
| 94 | + |
| 95 | +Execute the WASM with the `wasmedge`. |
| 96 | + |
| 97 | +```bash |
| 98 | +WASMEDGE_PLUGIN_PATH=/path/to/parent/directory/of/libwasmedgePluginWasiNN.so /path/to/wasmedge --dir .:. /path/to/wasm |
| 99 | +``` |
| 100 | + |
| 101 | +Example layout: |
| 102 | + |
| 103 | +``` |
| 104 | +. |
| 105 | +├── en_US-lessac-medium.onnx |
| 106 | +├── en_US-lessac-medium.onnx.json |
| 107 | +├── espeak-ng-data/ |
| 108 | +├── WasmEdge/build/ |
| 109 | +│ ├── plugins/wasi_nn/libwasmedgePluginWasiNN.so |
| 110 | +│ └── tools/wasmedge/wasmedge |
| 111 | +└── WasmEdge-WASINN-examples/wasmedge-piper/target/wasm32-wasi/release/wasmedge-piper.wasm |
| 112 | +``` |
| 113 | + |
| 114 | +Then the command will be: |
| 115 | + |
| 116 | +```bash |
| 117 | +WASMEDGE_PLUGIN_PATH=WasmEdge/build/plugins/wasi_nn WasmEdge/build/tools/wasmedge/wasmedge --dir .:. WasmEdge-WASINN-examples/wasmedge-piper/target/wasm32-wasi/release/wasmedge-piper.wasm |
| 118 | +``` |
| 119 | + |
| 120 | +The output `welcome.wav` is the synthesized audio. |
| 121 | + |
| 122 | +## Config options |
| 123 | + |
| 124 | +The JSON config options passed to WasmEdge WASI-NN Piper plugin via `bytes_array` in `wasmedge_wasi_nn::GraphBuilder::build_from_bytes` is similar to the Piper command-line program options. |
| 125 | + |
| 126 | +See [config.schema.json](config.schema.json) for available options and [json_input.schema.json](json_input.schema.json) for JSON input. |
0 commit comments