Skip to content

Commit c336c64

Browse files
committed
Merge branch 'docusaurus-version' into dev
2 parents 6e316e1 + 61d5f20 commit c336c64

File tree

39 files changed

+4501
-584
lines changed

39 files changed

+4501
-584
lines changed
Lines changed: 279 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,279 @@
1+
---
2+
description: This wiki provides a step-by-step guide to deploying the DeepSeek model on reComputer Jetson devices using MLC for optimized AI inference at the edge.
3+
title: Deploy DeepSeek on reComputer Jetson with MLC
4+
keywords:
5+
- reComputer
6+
- Jetson
7+
- LLM
8+
- MLC
9+
- deepseek
10+
image: https://files.seeedstudio.com/wiki/reComputer-Jetson/deepseek/deepseek.webp
11+
slug: /deploy_deepseek_on_jetson_with_mlc
12+
last_update:
13+
date: 02/13/2025
14+
author: Youjiang
15+
---
16+
17+
18+
# Deploy DeepSeek on reComputer Jetson with MLC
19+
20+
## Introduction
21+
22+
DeepSeek is a cutting-edge AI model suite optimized for efficiency, accuracy, and real-time processing. With advanced optimization for edge computing, DeepSeek enables fast, low-latency AI inference directly on Jetson devices, reducing dependency on cloud computing while maximizing performance.
23+
24+
In a [previous wiki](/deploy_deepseek_on_jetson), we have provided a quick guide to deploying DeepSeek on Jetson. However, the model deployed successfully did not achieve optimal inference speed.
25+
26+
This wiki provides a step-by-step guide to deploying [DeepSeek](https://www.deepseek.com/) on reComputer Jetson devices with [MLC](https://llm.mlc.ai/) for efficient AI inference on the edge.
27+
28+
## Prerequisites
29+
30+
- Jetson device with more than 8GB of memory.
31+
- The jetson device needs to be pre-flashed with the jetpack [5.1.1](https://wiki.seeedstudio.com/reComputer_Intro/) operating system or later.
32+
33+
:::note
34+
In this wiki, we will accomplish the following tasks using the [reComputer J4012 - Edge AI Computer with NVIDIA® Jetson™ Orin™ NX 16GB](https://www.seeedstudio.com/reComputer-J4012-p-5586.html?qid=eyJjX3NlYXJjaF9xdWVyeSI6InJlQ29tcHV0ZXIgSjQwMTIiLCJjX3NlYXJjaF9yZXN1bHRfcG9zIjo0LCJjX3RvdGFsX3Jlc3VsdHMiOjUyLCJjX3NlYXJjaF9yZXN1bHRfdHlwZSI6IlByb2R1Y3QiLCJjX3NlYXJjaF9maWx0ZXJzIjoic3RvcmVDb2RlOltyZXRhaWxlcl0gJiYgcXVhbnRpdHlfYW5kX3N0b2NrX3N0YXR1czpbMV0ifQ%3D%3D), but you can also try using other Jetson devices.
35+
:::
36+
37+
<div align="center">
38+
<img width={800}
39+
src="https://files.seeedstudio.com/wiki/reComputer-Jetson/deepseek/j4012.png" />
40+
</div>
41+
42+
<div class="get_one_now_container" style={{textAlign: 'center'}}>
43+
<a class="get_one_now_item" href="https://www.seeedstudio.com/reComputer-J4012-p-5586.html?qid=eyJjX3NlYXJjaF9xdWVyeSI6InJlQ29tcHV0ZXIgSjQwMTIiLCJjX3NlYXJjaF9yZXN1bHRfcG9zIjo0LCJjX3RvdGFsX3Jlc3VsdHMiOjUyLCJjX3NlYXJjaF9yZXN1bHRfdHlwZSI6IlByb2R1Y3QiLCJjX3NlYXJjaF9maWx0ZXJzIjoic3RvcmVDb2RlOltyZXRhaWxlcl0gJiYgcXVhbnRpdHlfYW5kX3N0b2NrX3N0YXR1czpbMV0ifQ%3D%3D"><strong><span><font color={'FFFFFF'} size={"4"}> Get One Now 🖱️</font></span></strong>
44+
</a>
45+
</div>
46+
47+
## Getting Started
48+
49+
### Hardware Connection
50+
- Connect the Jetson device to the network, mouse, keyboard, and monitor.
51+
52+
:::note
53+
Of course, you can also remotely access the Jetson device via SSH over the local network.
54+
:::
55+
56+
### Install and Configure Jetson's Docker
57+
58+
First, we need to follow the [tutorial](https://www.jetson-ai-lab.com/tips_ssd-docker.html) provided by the Jetson AI Lab to install Docker.
59+
60+
step1. Install `nvidia-container` package.
61+
62+
```bash
63+
sudo apt update
64+
sudo apt install -y nvidia-container
65+
```
66+
67+
:::info
68+
If you flash **Jetson Linux (L4T) R36.x (JetPack 6.x) on your Jetson using SDK Manager, and install nvidia-container using apt , on JetPack 6.x it no longer automatically installs Docker.
69+
70+
Therefore, you need to run the following to manually install Docker and set it up.
71+
```bash
72+
sudo apt update
73+
sudo apt install -y nvidia-container curl
74+
curl https://get.docker.com | sh && sudo systemctl --now enable docker
75+
sudo nvidia-ctk runtime configure --runtime=docker
76+
```
77+
:::
78+
79+
step2. Restart the Docker service and add your user to the docker group.
80+
81+
```bash
82+
sudo systemctl restart docker
83+
sudo usermod -aG docker $USER
84+
newgrp docker
85+
```
86+
87+
step3. Add default runtime in `/etc/docker/daemon.json`.
88+
89+
```bash
90+
sudo apt install -y jq
91+
sudo jq '. + {"default-runtime": "nvidia"}' /etc/docker/daemon.json | \
92+
sudo tee /etc/docker/daemon.json.tmp && \
93+
sudo mv /etc/docker/daemon.json.tmp /etc/docker/daemon.json
94+
```
95+
96+
step4. Restart Docker.
97+
98+
```bash
99+
sudo systemctl daemon-reload && sudo systemctl restart docker
100+
```
101+
102+
103+
104+
105+
### Load and Run DeepSeek
106+
107+
We can refer to the Docker container provided by the `Jetson AI Lab` to quickly deploy the MLC-quantized DeepSeek model on Jetson.
108+
Open the [Jetson AI Lab](https://www.jetson-ai-lab.com/index.html) website and find the deployment command.
109+
110+
`Models` --> `Orin NX` --> `docker run` --> `copy`
111+
112+
:::info
113+
Before we copy the installation commands, we can modify the relevant parameters on the left.
114+
:::
115+
116+
<div align="center">
117+
<img width={800}
118+
src="https://files.seeedstudio.com/wiki/reComputer-Jetson/deepseek/mlc/deploy_deepseek.png" />
119+
</div>
120+
121+
Open the terminal window on the Jetson device, paste the installation command we just copied into the terminal, and press the `Enter` key on the keyboard to run the command.
122+
When we see the following content in the terminal window, it means the deepseek model has been successfully loaded on the Jetson device.
123+
124+
<div align="center">
125+
<img width={800}
126+
src="https://files.seeedstudio.com/wiki/reComputer-Jetson/deepseek/mlc/success_install_deepseek.png" />
127+
</div>
128+
129+
At this point, we can open a new terminal window and enter the following command to test if the model can perform inference correctly.
130+
131+
:::danger
132+
Please note, do not close the terminal window running the deepseek model.
133+
:::
134+
135+
```bash
136+
curl http://0.0.0.0:9000/v1/chat/completions \
137+
-H "Content-Type: application/json" \
138+
-H "Authorization: Bearer none" \
139+
-d '{
140+
"model": "*",
141+
"messages": [{"role":"user","content":"Why did the LLM cross the road?"}],
142+
"temperature": 0.6,
143+
"top_p": 0.95,
144+
"stream": false,
145+
"max_tokens": 100
146+
}'
147+
```
148+
149+
<div align="center">
150+
<img width={800}
151+
src="https://files.seeedstudio.com/wiki/reComputer-Jetson/deepseek/mlc/get_response.png" />
152+
</div>
153+
154+
### Install Open WebUI
155+
156+
```bash
157+
sudo docker run -d --network=host \
158+
-v ${HOME}/open-webui:/app/backend/data \
159+
-e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
160+
--name open-webui \
161+
--restart always \
162+
ghcr.io/open-webui/open-webui:main
163+
```
164+
After the installer finishes running, you can enter http://<ip_of_jetson>:8080 in the browser to launch the UI interface.
165+
166+
<div align="center">
167+
<img width={800}
168+
src="https://files.seeedstudio.com/wiki/reComputer-Jetson/deepseek/mlc/install_webui.png" />
169+
</div>
170+
171+
Then, we need to configure the large model inference engine for OpenWebUI.
172+
173+
`User(top right corner)` --> `Settings` --> `Admin Settings` --> `Connections`
174+
175+
Change the OpenAI URL to the local MLC inference server where DeepSeek is already loaded.
176+
177+
For example, if the IP address of my Jetson device is `192.168.49.241`, my URL should be `http://192.168.49.241:9000/v1`
178+
179+
<div align="center">
180+
<img width={800}
181+
src="https://files.seeedstudio.com/wiki/reComputer-Jetson/deepseek/mlc/cfg_model.png" />
182+
</div>
183+
184+
After saving the configuration, we can create a new chat window to experience the extremely fast inference speed of the local DeepSeek model!
185+
186+
<div align="center">
187+
<img width={800}
188+
src="https://files.seeedstudio.com/wiki/reComputer-Jetson/deepseek/mlc/chat.png" />
189+
</div>
190+
191+
### Test Inference Speed
192+
193+
Here, we can use this Python script to roughly test the model's inference speed.
194+
195+
On the Jetson device, create a new Python file named `test_inference_speed.py` and fill it with the following code.
196+
197+
Then, execute the script by running the command `python test_inference_speed.py` in the terminal.
198+
199+
<details>
200+
201+
<summary> test_inference_speed.py </summary>
202+
203+
```python
204+
import time
205+
import requests
206+
207+
208+
url = "http://0.0.0.0:9000/v1/chat/completions"
209+
headers = {
210+
"Content-Type": "application/json",
211+
"Authorization": "Bearer none"
212+
}
213+
214+
data = {
215+
"model": "*",
216+
"messages": [{"role": "user", "content": "Why did the LLM cross the road?"}],
217+
"temperature": 0.6,
218+
"top_p": 0.95,
219+
"stream": True,
220+
"max_tokens": 1000
221+
}
222+
223+
start_time = time.time()
224+
response = requests.post(url, headers=headers, json=data, stream=True)
225+
226+
token_count = 0
227+
for chunk in response.iter_lines():
228+
if chunk:
229+
token_count += 1
230+
print(chunk)
231+
232+
end_time = time.time()
233+
elapsed_time = end_time - start_time
234+
tokens_per_second = token_count / elapsed_time
235+
236+
print(f"Total Tokens: {token_count}")
237+
print(f"Elapsed Time: {elapsed_time:.3f} seconds")
238+
print(f"Tokens per second: {tokens_per_second:.2f} tokens/second")
239+
```
240+
241+
</details>
242+
243+
244+
<div align="center">
245+
<img width={800}
246+
src="https://files.seeedstudio.com/wiki/reComputer-Jetson/deepseek/mlc/test_infer_speed.png" />
247+
</div>
248+
249+
The calculation results show that the inference speed of the MLC-compiled deepseek1.5B model deployed on the Jetson Orin NX device is approximately **60 tokens/s**.
250+
251+
252+
## Effect Demonstration
253+
254+
In the demonstration video, the Jetson device operates at just under 20W yet achieves an impressive inference speed.
255+
256+
<div align="center">
257+
<iframe width="800" height="450" src="https://www.youtube.com/embed/ohd_T95br90" title="deploy deepseek on jetson with mlc" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
258+
</div>
259+
260+
## References
261+
- https://www.jetson-ai-lab.com/models.html
262+
- https://www.deepseek.com/
263+
- https://wiki.seeedstudio.com/deploy_deepseek_on_jetson/
264+
- https://www.seeedstudio.com/tag/nvidia.html
265+
266+
267+
## Tech Support & Product Discussion
268+
269+
Thank you for choosing our products! We are here to provide you with different support to ensure that your experience with our products is as smooth as possible. We offer several communication channels to cater to different preferences and needs.
270+
271+
<div class="button_tech_support_container">
272+
<a href="https://forum.seeedstudio.com/" class="button_forum"></a>
273+
<a href="https://www.seeedstudio.com/contacts" class="button_email"></a>
274+
</div>
275+
276+
<div class="button_tech_support_container">
277+
<a href="https://discord.gg/eWkprNDMU7" class="button_discord"></a>
278+
<a href="https://github.com/Seeed-Studio/wiki-documents/discussions/69" class="button_discussion"></a>
279+
</div>

0 commit comments

Comments
 (0)