Skip to content

Commit 08bb144

Browse files
Add Docker + Wasm + GPU section (#222)
Signed-off-by: vincent <[email protected]>
1 parent ed7b0ae commit 08bb144

File tree

1 file changed

+165
-0
lines changed

1 file changed

+165
-0
lines changed
Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
---
2+
sidebar_position: 5
3+
---
4+
5+
# Docker + WASM + GPU
6+
7+
This is a completely new approach, adopting Docker + Crun with Wasmedge + [CDI](https://github.com/cncf-tags/container-device-interface) to enable the usage of host GPU devices. The reason for not continuing with the use of runwasi as the wasm runtime within Docker from the previous chapter is due to considerations of the current stage of support for CDI and the compatibility approach.
8+
9+
10+
## Prerequisite
11+
12+
Before we start, you need
13+
14+
- GPU device (Here we will take NVIDIA graphics cards as our example and we have only conducted tests on NVIDIA GPUs on linux for now)
15+
- Install NVIDIA GPU Driver
16+
- Install either the NVIDIA Container Toolkit or you installed the nvidia-container-toolkit-base package.
17+
- Docker version > 4.29 (which includes Moby 25)
18+
19+
Regarding the installation of the NVIDIA driver and toolkit, we won't go into detail here, but we could provide a few reference documents and the ways to verify your environment is ok.
20+
21+
[Nvidia drivers installation on ubuntu](https://ubuntu.com/server/docs/nvidia-drivers-installation), [Toolkit install guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html), [Nvidia CDI supoort reference](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html)
22+
23+
```bash
24+
# check your driver and device
25+
> nvidia-smi -L
26+
27+
# Check your toolkit
28+
> nvidia-ctk --version
29+
```
30+
31+
Install latest docker-ce
32+
```bash
33+
> curl -fsSL https://get.docker.com -o get-docker.sh
34+
> sh get-docker.sh
35+
36+
# Check your docker
37+
> docker --version
38+
```
39+
40+
## CDI setup
41+
42+
[Generate the CDI specification file](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html#procedure)
43+
44+
```bash
45+
> sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
46+
47+
# Check you cdi config is good
48+
> nvidia-ctk cdi list
49+
50+
# Example output
51+
INFO[0000] Found 2 CDI devices
52+
nvidia.com/gpu=0
53+
nvidia.com/gpu=all
54+
```
55+
56+
[Enable CDI in docker config](https://docs.docker.com/reference/cli/dockerd/#enable-cdi-devices) (/etc/docker/daemon.json)
57+
```json
58+
{
59+
"features": {
60+
"cdi": true
61+
},
62+
"cdi-spec-dirs": ["/etc/cdi/", "/var/run/cdi"]
63+
}
64+
```
65+
66+
```bash
67+
# Reload docker daemon
68+
> sudo systemctl reload docker
69+
70+
# Test your cdi setup good
71+
> docker run --rm --device nvidia.com/gpu=all ubuntu:22.04 nvidia-smi -L
72+
73+
# Example output
74+
GPU 0: NVIDIA GeForce GTX 1080 (UUID: GPU-********-****-****-****-************)
75+
```
76+
77+
## Setup your container runtime (crun + wasmedge + plugin system)
78+
79+
Build crun with wasmedge and plugin system both enable
80+
81+
```bash
82+
> sudo apt install -y make git gcc build-essential pkgconf libtool libsystemd-dev libprotobuf-c-dev libcap-dev libseccomp-dev libyajl-dev go-md2man libtool autoconf python3 automake
83+
84+
> git clone -b enable-wasmedge-plugin https://github.com/second-state/crun
85+
> cd crun
86+
> ./autogen.sh
87+
> ./configure --with-wasmedge
88+
> make
89+
90+
# Check your crun
91+
> ./crun --version
92+
```
93+
94+
Replace container run time
95+
```json
96+
{
97+
"runtimes": {
98+
"crun": {
99+
"path": "<the crun binary path build from you>"
100+
}
101+
},
102+
"features": {
103+
"cdi": true
104+
},
105+
"cdi-spec-dirs": ["/etc/cdi/", "/var/run/cdi"]
106+
}
107+
```
108+
109+
```bash
110+
# Reload docker daemon
111+
> sudo systemctl reload docker
112+
```
113+
114+
Download ggml plugin into host
115+
```bash
116+
> curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugins wasi_nn-ggml
117+
118+
# Make sure all your plugin dependencies is good
119+
> ldd ~/.wasmedge/plugin/libwasmedgePluginWasiNN.so
120+
```
121+
122+
## Demo llama with our wasm application
123+
124+
> The demo image is built the Wasm application from [here](https://github.com/second-state/WasmEdge-WASINN-examples/tree/master/wasmedge-ggml/llama), and upload it to [here](https://github.com/captainvincent/runwasi/pkgs/container/runwasi-demo/195178675?tag=wasmedge-ggml-llama).
125+
126+
Download inference model
127+
```bash
128+
> curl -LO https://huggingface.co/second-state/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf
129+
```
130+
131+
Docker run llama2 inference
132+
```bash
133+
docker run -v ~/.wasmedge/plugin/libwasmedgePluginWasiNN.so:/.wasmedge/plugin/libwasmedgePluginWasiNN.so \
134+
-v /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.12:/lib/x86_64-linux-gnu/libcudart.so.12 \
135+
-v /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.12:/lib/x86_64-linux-gnu/libcublas.so.12 \
136+
-v /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.12:/lib/x86_64-linux-gnu/libcublasLt.so.12 \
137+
-v /lib/x86_64-linux-gnu/libcuda.so.1:/lib/x86_64-linux-gnu/libcuda.so.1 \
138+
-v .:/resource \
139+
--env WASMEDGE_PLUGIN_PATH=/.wasmedge/plugin \
140+
--env WASMEDGE_WASINN_PRELOAD=default:GGML:AUTO:/resource/llama-2-7b-chat.Q5_K_M.gguf \
141+
--env n_gpu_layers=100 \
142+
--rm --device nvidia.com/gpu=all --runtime=crun --annotation=module.wasm.image/variant=compat-smart --platform wasip1/wasm \
143+
ghcr.io/captainvincent/runwasi-demo:wasmedge-ggml-llama default \
144+
$'[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you do not know the answer to a question, please do not share false information.\n<</SYS>>\nWhat is the capital of Japan?[/INST]'
145+
```
146+
147+
Example Result
148+
```
149+
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
150+
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
151+
ggml_init_cublas: found 1 CUDA devices:
152+
Device 0: NVIDIA GeForce GTX 1080, compute capability 6.1, VMM: yes
153+
Prompt:
154+
[INST] <<SYS>>
155+
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you do not know the answer to a question, please do not share false information.
156+
<</SYS>>
157+
What is the capital of Japan?[/INST]
158+
Response:
159+
[INFO] llama_commit: "4ffcdce2"
160+
[INFO] llama_build_number: 2334
161+
[INFO] Number of input tokens: 140
162+
Thank you for asking! The capital of Japan is Tokyo. I'm glad you asked! It's important to be informed and curious about different countries and their capitals. Is there anything else I can help you with?
163+
[INFO] Number of input tokens: 140
164+
[INFO] Number of output tokens: 48
165+
```

0 commit comments

Comments
 (0)