Skip to content

Commit 616562d

Browse files
authored
[Example] Basic example of WASI-NN whisper backend. (#147)
Signed-off-by: YiYing He <[email protected]>
1 parent da18b35 commit 616562d

File tree

5 files changed

+120
-0
lines changed

5 files changed

+120
-0
lines changed

whisper-basic/Cargo.toml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
[package]
2+
name = "whisper-basic"
3+
version = "0.1.0"
4+
edition = "2021"
5+
6+
[dependencies]
7+
wasmedge-wasi-nn = "0.8.0"

whisper-basic/README.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Basic Example For WASI-NN with Whisper Backend
2+
3+
This example is for a basic audio recognition with WASI-NN whisper backend in WasmEdge.
4+
In current status, WasmEdge implement the Whisper backend of WASI-NN in only English. We'll extend more options in the future.
5+
6+
## Dependencies
7+
8+
This crate depends on the `wasmedge-wasi-nn` in the `Cargo.toml`:
9+
10+
```toml
11+
[dependencies]
12+
wasmedge-wasi-nn = "0.8.0"
13+
```
14+
15+
## Build
16+
17+
Compile the application to WebAssembly:
18+
19+
```bash
20+
cargo build --target=wasm32-wasi --release
21+
```
22+
23+
The output WASM file will be at [`target/wasm32-wasi/release/whisper-basic.wasm`](whisper-basic.wasm).
24+
To speed up the processing, we can enable the AOT mode in WasmEdge with:
25+
26+
```bash
27+
wasmedge compile target/wasm32-wasi/release/whisper-basic.wasm whisper-basic_aot.wasm
28+
```
29+
30+
## Run
31+
32+
### Test data
33+
34+
The testing audio is located at `./test.wav`.
35+
36+
Users should get the model by the guide from [whisper.cpp repository](https://github.com/ggerganov/whisper.cpp/tree/master/models):
37+
38+
```bash
39+
curl -sSf https://raw.githubusercontent.com/ggerganov/whisper.cpp/master/models/download-ggml-model.sh | bash -s -- base.en
40+
```
41+
42+
The model will be stored at `./ggml-base.en.bin`.
43+
44+
### Input Audio
45+
46+
The WASI-NN whisper backend for WasmEdge currently supported 16kHz, 1 channel, and `pcm_s16le` format.
47+
48+
Users can convert their input audio as following `ffmpeg` command:
49+
50+
```bash
51+
ffmpeg -i test.m4a -acodec pcm_s16le -ac 1 -ar 16000 test.wav
52+
```
53+
54+
### Execute
55+
56+
> Note: This is prepared for `0.14.2` or later release in the future. Please build from source now.
57+
58+
Users should [install the WasmEdge with WASI-NN plug-in in Whisper backend](https://wasmedge.org/docs/start/install/#wasi-nn-plug-ins).
59+
60+
```bash
61+
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugins wasi_nn-whisper
62+
```
63+
64+
Execute the WASM with the `wasmedge` with WASI-NN plug-in:
65+
66+
```bash
67+
wasmedge --dir .:. whisper-basic_aot.wasm ggml-base.en.bin test.wav
68+
```
69+
70+
You will get recognized string from the audio file in the output:
71+
72+
```bash
73+
Read model, size in bytes: 147964211
74+
Loaded graph into wasi-nn with ID: Graph#0
75+
Read input tensor, size in bytes: 141408
76+
Recognized from audio:
77+
[00:00:00.000 --> 00:00:04.300] This is a test record for whisper.cpp
78+
```

whisper-basic/src/main.rs

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
use std::env;
2+
use std::fs;
3+
use std::error::Error;
4+
use wasmedge_wasi_nn::{GraphBuilder, GraphEncoding, ExecutionTarget, TensorType};
5+
6+
pub fn main() -> Result<(), Box<dyn Error>> {
7+
let args: Vec<String> = env::args().collect();
8+
let model_bin_name: &str = &args[1];
9+
let wav_name: &str = &args[2];
10+
11+
let model_bin = fs::read(model_bin_name)?;
12+
println!("Read model, size in bytes: {}", model_bin.len());
13+
14+
let graph = GraphBuilder::new(GraphEncoding::Whisper, ExecutionTarget::CPU).build_from_bytes(&[&model_bin])?;
15+
let mut ctx = graph.init_execution_context()?;
16+
println!("Loaded graph into wasi-nn with ID: {}", graph);
17+
18+
// Load the raw pcm tensor.
19+
let wav_buf = fs::read(wav_name)?;
20+
println!("Read input tensor, size in bytes: {}", wav_buf.len());
21+
22+
// Set input.
23+
ctx.set_input(0, TensorType::F32, &[1, wav_buf.len()], &wav_buf)?;
24+
25+
// Execute the inference.
26+
ctx.compute()?;
27+
28+
// Retrieve the output.
29+
let mut output_buffer = vec![0u8; 2048];
30+
_ = ctx.get_output(0, &mut output_buffer)?;
31+
32+
println!("Recognized from audio: \n{}", String::from_utf8(output_buffer).unwrap());
33+
34+
Ok(())
35+
}

whisper-basic/test.wav

138 KB
Binary file not shown.

whisper-basic/whisper-basic.wasm

1.66 MB
Binary file not shown.

0 commit comments

Comments
 (0)