Skip to content

Commit da18b35

Browse files
authored
Add neural speed example (#135)
* feat: add neural speed example * feat: change finiSingle to unload * fix: backend name
1 parent 654ffc5 commit da18b35

File tree

4 files changed

+152
-0
lines changed

4 files changed

+152
-0
lines changed

wasmedge-neuralspeed/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
/target

wasmedge-neuralspeed/Cargo.toml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
[package]
2+
name = "wasmedge-neural-speed"
3+
version = "0.1.0"
4+
edition = "2021"
5+
6+
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
7+
8+
[dependencies]
9+
tokenizers = { version = "0.19.1", features = ["unstable_wasm"], default-features = false }
10+
serde_json = "1.0"
11+
wasmedge-wasi-nn = "0.7.1"

wasmedge-neuralspeed/README.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Neural chat example with WasmEdge WASI-NN Neural Speed plugin
2+
This example demonstrates how to use WasmEdge WASI-NN Neural Speed plugin to perform an inference task with Neural chat model.
3+
4+
## Install WasmeEdge with WASI-NN Neural Speed plugin
5+
6+
The Neural Speed backend relies on Neural Speed, we recommend the following commands to install Neural Speed.
7+
8+
``` bash
9+
sudo apt update
10+
sudo apt upgrade
11+
sudo apt install python3-dev
12+
wget https://raw.githubusercontent.com/intel/neural-speed/main/requirements.txt
13+
pip install -r requirements.txt
14+
pip install neural-speed
15+
```
16+
17+
Then build and install WasmEdge from source:
18+
19+
``` bash
20+
cd <path/to/your/wasmedge/source/folder>
21+
22+
cmake -GNinja -Bbuild -DCMAKE_BUILD_TYPE=Release -DWASMEDGE_PLUGIN_WASI_NN_BACKEND="neuralspeed"
23+
cmake --build build
24+
25+
# For the WASI-NN plugin, you should install this project.
26+
cmake --install build
27+
```
28+
29+
Then you will have an executable `wasmedge` runtime under `/usr/local/bin` and the WASI-NN with Neural Speed backend plug-in under `/usr/local/lib/wasmedge/libwasmedgePluginWasiNN.so` after installation.
30+
## Model Download Link
31+
32+
In this example, we will use neural-chat-7b-v3-1.Q4_0 model in GGUF format.
33+
34+
``` bash
35+
# Download model weight
36+
wget https://huggingface.co/TheBloke/neural-chat-7B-v3-1-GGUF/resolve/main/neural-chat-7b-v3-1.Q4_0.gguf
37+
# Download tokenizer
38+
wget https://huggingface.co/Intel/neural-chat-7b-v3-1/raw/main/tokenizer.json -O neural-chat-tokenizer.json
39+
```
40+
41+
## Build wasm
42+
43+
Run the following command to build wasm, the output WASM file will be at `target/wasm32-wasi/release/`
44+
45+
```bash
46+
cargo build --target wasm32-wasi --release
47+
```
48+
49+
## Execute
50+
51+
Execute the WASM with the `wasmedge` using nn-preload to load model.
52+
53+
```bash
54+
wasmedge --dir .:. \
55+
--nn-preload default:NeuralSpeed:AUTO:neural-chat-7b-v3-1.Q4_0.gguf \
56+
./target/wasm32-wasi/release/wasmedge-neural-speed.wasm default
57+
58+
```
59+
60+
## Other
61+
62+
You can change tokenizer_path to your tokenizer path.
63+
64+
``` rust
65+
let tokenizer_name = "neural-chat-tokenizer.json";
66+
```
67+
68+
Prompt is the default model input.
69+
70+
``` rust
71+
let prompt = "Once upon a time, there existed a little girl,";
72+
```
73+
If your model type not llama, you can set model_type parameter to load different model.
74+
75+
``` rust
76+
let graph = GraphBuilder::new(GraphEncoding::NeuralSpeed, ExecutionTarget::AUTO)
77+
.config(serde_json::to_string(&json!({"model_type": "mistral"}))
78+
```

wasmedge-neuralspeed/src/main.rs

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
use tokenizers::tokenizer::Tokenizer;
2+
use serde_json::json;
3+
use wasmedge_wasi_nn::{
4+
self, ExecutionTarget, GraphBuilder, GraphEncoding, GraphExecutionContext,
5+
TensorType,
6+
};
7+
use std::env;
8+
fn get_data_from_context(context: &GraphExecutionContext, index: usize) -> Vec<u8> {
9+
// Preserve for 4096 tokens with average token length 8
10+
const MAX_OUTPUT_BUFFER_SIZE: usize = 4096 * 8;
11+
let mut output_buffer = vec![0u8; MAX_OUTPUT_BUFFER_SIZE];
12+
let _ = context
13+
.get_output(index, &mut output_buffer)
14+
.expect("Failed to get output");
15+
16+
return output_buffer;
17+
}
18+
19+
fn get_output_from_context(context: &GraphExecutionContext) -> Vec<u8> {
20+
get_data_from_context(context, 0)
21+
}
22+
fn main() {
23+
let tokenizer_path = "neural-chat-tokenizer.json";
24+
let prompt = "Once upon a time, there existed a little girl,";
25+
let args: Vec<String> = env::args().collect();
26+
let model_name: &str = &args[1];
27+
let tokenizer:Tokenizer = Tokenizer::from_file(tokenizer_path).unwrap();
28+
let encoding = tokenizer.encode(prompt, true).unwrap();
29+
let inputs = encoding.get_ids();
30+
let mut tensor_data: Vec<u8> = Vec::with_capacity(inputs.len() * 8);
31+
32+
for &val in inputs {
33+
let mut bytes = u64::from(val).to_be_bytes();
34+
bytes.reverse();
35+
tensor_data.extend_from_slice(&bytes);
36+
}
37+
let graph = GraphBuilder::new(GraphEncoding::NeuralSpeed, ExecutionTarget::AUTO)
38+
.config(serde_json::to_string(&json!({"model_type": "mistral"})).expect("Failed to serialize options"))
39+
.build_from_cache(model_name)
40+
.expect("Failed to build graph");
41+
let mut context = graph
42+
.init_execution_context()
43+
.expect("Failed to init context");
44+
context
45+
.set_input(0, TensorType::U8, &[1], &tensor_data)
46+
.expect("Failed to set input");
47+
context.compute().expect("Failed to compute");
48+
let output_bytes = get_output_from_context(&context);
49+
let output_id:Vec<u32> = output_bytes
50+
.chunks(8)
51+
.map(|chunk| {
52+
chunk
53+
.iter()
54+
.enumerate()
55+
.fold(0u64, |acc, (i, &byte)| acc + ((byte as u64) << (i * 8))) as u32
56+
})
57+
.collect();
58+
let output = tokenizer.decode(&output_id, true).unwrap();
59+
println!("{}", output);
60+
graph.unload().expect("Failed to free resource");
61+
62+
}

0 commit comments

Comments
 (0)