Skip to content

Commit 45aa241

Browse files
committed
update readme.md.
1 parent d7ef8b2 commit 45aa241

File tree

2 files changed

+109
-2
lines changed

2 files changed

+109
-2
lines changed

README.md

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ Connect home devices into a powerful cluster to accelerate LLM inference. More d
88

99
Supports Linux, macOS, and Windows. Optimized for ARM and x86_64 AVX2 CPUs.
1010

11+
**How to Run**
12+
- [🍓 How to Run on Raspberry Pi](./docs/HOW_TO_RUN_RASPBERRYPI.md)
13+
1114
**News**
1215
- 5 Sep 2025 - Qwen 3 MOE models are now supported on CPU.
1316
- 3 Aug 2025 - Qwen 3 0.6B, 1.7B, 8B and 14B models are now supported.
@@ -51,9 +54,19 @@ Supported architectures: Llama, Qwen3.
5154

5255
### 👷 Architecture
5356

57+
````
58+
[🔀 SWITCH OR ROUTER]
59+
| | | |
60+
| | | |_______ 🔸 device1 (ROOT) 10.0.0.1
61+
| | |_________ 🔹 device2 (WORKER 1) 10.0.0.2:9999
62+
| |___________ 🔹 device3 (WORKER 2) 10.0.0.3:9999
63+
|_____________ 🔹 device4 (WORKER 3) 10.0.0.4:9999
64+
...
65+
````
66+
5467
The project is split up into two parts:
55-
* **Root node** - it's responsible for loading the model and weights and forward them to workers. Also, it synchronizes the state of the neural network. The root node is also a worker, it processes own slice of the neural network.
56-
* **Worker node** - it processes own slice of the neural network. It doesn't require any configuration related to the model.
68+
* **🔸 Root node** - it's responsible for loading the model and weights and forward them to workers. Also, it synchronizes the state of the neural network. The root node is also a worker, it processes own slice of the neural network.
69+
* **🔹 Worker node** - it processes own slice of the neural network. It doesn't require any configuration related to the model.
5770

5871
You always need the root node and you can add 2^n - 1 worker nodes to speed up the inference. The RAM usage of the neural network is split up across all nodes. The root node requires a bit more RAM than worker nodes.
5972

docs/HOW_TO_RUN_RASPBERRYPI.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# How to run Distributed Llama on 🍓 Raspberry Pi
2+
3+
This article describes how to run Distributed Llama on 4 Raspberry Pi devices, but you can also run it on 1, 2, 4, 8... devices. Please adjust the commands and topology according to your configuration.
4+
5+
````
6+
[🔀 SWITCH OR ROUTER]
7+
| | | |
8+
| | | |_______ 🔸 raspberrypi1 (ROOT) 10.0.0.1
9+
| | |_________ 🔹 raspberrypi2 (WORKER 1) 10.0.0.2:9999
10+
| |___________ 🔹 raspberrypi3 (WORKER 2) 10.0.0.3:9999
11+
|_____________ 🔹 raspberrypi4 (WORKER 3) 10.0.0.4:9999
12+
````
13+
14+
1. Install `Raspberry Pi OS Lite (64 bit)` on your **🔸🔹 ALL** Raspberry Pi devices. This OS doesn't have desktop environment but you can easily connect via SSH to manage it.
15+
2. Connect **🔸🔹 ALL** devices to your **🔀 SWITCH OR ROUTER** via Ethernet cable.
16+
3. Connect to all devices via SSH from your computer.
17+
18+
```
19+
ssh user@raspberrypi1.local
20+
ssh user@raspberrypi2.local
21+
ssh user@raspberrypi3.local
22+
ssh user@raspberrypi4.local
23+
```
24+
25+
4. Install Git on **🔸🔹 ALL** devices:
26+
27+
```sh
28+
sudo apt install git
29+
```
30+
31+
5. Clone this repository and compile Distributed Llama on **🔸🔹 ALL** devices:
32+
33+
```sh
34+
git clone https://github.com/b4rtaz/distributed-llama.git
35+
cd distributed-llama
36+
make dllama
37+
make dllama-api
38+
```
39+
40+
6. Download the model to the **🔸 ROOT** device using the `launch.py` script. You don't need to download the model on worker devices.
41+
42+
```sh
43+
python3 launch.py # Prints a list of available models
44+
45+
python3 launch.py llama3_2_3b_instruct_q40 # Downloads the model to the root device
46+
```
47+
48+
7. Assign static IP addresses on **🔸🔹 ALL** devices. Each device must have a unique IP address in the same subnet.
49+
50+
```sh
51+
sudo ip addr add 10.0.0.1/24 dev eth0 # 🔸 ROOT
52+
sudo ip addr add 10.0.0.2/24 dev eth0 # 🔹 WORKER 1
53+
sudo ip addr add 10.0.0.3/24 dev eth0 # 🔹 WORKER 2
54+
sudo ip addr add 10.0.0.4/24 dev eth0 # 🔹 WORKER 3
55+
```
56+
57+
8. Start workers on all **🔹 WORKER** devices:
58+
59+
```sh
60+
sudo nice -n -20 ./dllama worker --port 9999 --nthreads 4
61+
```
62+
63+
8. Run the inference to test if everything works fine on the **🔸 ROOT** device:
64+
65+
```sh
66+
sudo nice -n -20 ./dllama inference \
67+
--prompt "Hello world" \
68+
--steps 32 \
69+
--model models/llama3_2_3b_instruct_q40/dllama_model_llama3_2_3b_instruct_q40.m \
70+
--tokenizer models/llama3_2_3b_instruct_q40/dllama_tokenizer_llama3_2_3b_instruct_q40.t \
71+
--buffer-float-type q80 \
72+
--nthreads 4 \
73+
--max-seq-len 4096 \
74+
--workers 10.0.0.2:9999 10.0.0.3:9999 10.0.0.4:9999
75+
```
76+
77+
9. To run the API server, start it on the **🔸 ROOT** device:
78+
79+
```sh
80+
sudo nice -n -20 ./dllama-api \
81+
--port 9999 \
82+
--model models/llama3_2_3b_instruct_q40/dllama_model_llama3_2_3b_instruct_q40.m \
83+
--tokenizer models/llama3_2_3b_instruct_q40/dllama_tokenizer_llama3_2_3b_instruct_q40.t \
84+
--buffer-float-type q80 \
85+
--nthreads 4 \
86+
--max-seq-len 4096 \
87+
--workers 10.0.0.2:9999 10.0.0.3:9999 10.0.0.4:9999
88+
```
89+
90+
Now you can connect to the API server from your computer:
91+
92+
```
93+
http://raspberrypi1.local:9999/v1/models
94+
```

0 commit comments

Comments
 (0)