|
| 1 | +# How to run Distributed Llama on 🍓 Raspberry Pi |
| 2 | + |
| 3 | +This article describes how to run Distributed Llama on 4 Raspberry Pi devices, but you can also run it on 1, 2, 4, 8... devices. Please adjust the commands and topology according to your configuration. |
| 4 | + |
| 5 | +```` |
| 6 | +[🔀 SWITCH OR ROUTER] |
| 7 | + | | | | |
| 8 | + | | | |_______ 🔸 raspberrypi1 (ROOT) 10.0.0.1 |
| 9 | + | | |_________ 🔹 raspberrypi2 (WORKER 1) 10.0.0.2:9999 |
| 10 | + | |___________ 🔹 raspberrypi3 (WORKER 2) 10.0.0.3:9999 |
| 11 | + |_____________ 🔹 raspberrypi4 (WORKER 3) 10.0.0.4:9999 |
| 12 | +```` |
| 13 | + |
| 14 | +1. Install `Raspberry Pi OS Lite (64 bit)` on your **🔸🔹 ALL** Raspberry Pi devices. This OS doesn't have desktop environment but you can easily connect via SSH to manage it. |
| 15 | +2. Connect **🔸🔹 ALL** devices to your **🔀 SWITCH OR ROUTER** via Ethernet cable. |
| 16 | +3. Connect to all devices via SSH from your computer. |
| 17 | + |
| 18 | +``` |
| 19 | +ssh user@raspberrypi1.local |
| 20 | +ssh user@raspberrypi2.local |
| 21 | +ssh user@raspberrypi3.local |
| 22 | +ssh user@raspberrypi4.local |
| 23 | +``` |
| 24 | + |
| 25 | +4. Install Git on **🔸🔹 ALL** devices: |
| 26 | + |
| 27 | +```sh |
| 28 | +sudo apt install git |
| 29 | +``` |
| 30 | + |
| 31 | +5. Clone this repository and compile Distributed Llama on **🔸🔹 ALL** devices: |
| 32 | + |
| 33 | +```sh |
| 34 | +git clone https://github.com/b4rtaz/distributed-llama.git |
| 35 | +cd distributed-llama |
| 36 | +make dllama |
| 37 | +make dllama-api |
| 38 | +``` |
| 39 | + |
| 40 | +6. Download the model to the **🔸 ROOT** device using the `launch.py` script. You don't need to download the model on worker devices. |
| 41 | + |
| 42 | +```sh |
| 43 | +python3 launch.py # Prints a list of available models |
| 44 | + |
| 45 | +python3 launch.py llama3_2_3b_instruct_q40 # Downloads the model to the root device |
| 46 | +``` |
| 47 | + |
| 48 | +7. Assign static IP addresses on **🔸🔹 ALL** devices. Each device must have a unique IP address in the same subnet. |
| 49 | + |
| 50 | +```sh |
| 51 | +sudo ip addr add 10.0.0.1/24 dev eth0 # 🔸 ROOT |
| 52 | +sudo ip addr add 10.0.0.2/24 dev eth0 # 🔹 WORKER 1 |
| 53 | +sudo ip addr add 10.0.0.3/24 dev eth0 # 🔹 WORKER 2 |
| 54 | +sudo ip addr add 10.0.0.4/24 dev eth0 # 🔹 WORKER 3 |
| 55 | +``` |
| 56 | + |
| 57 | +8. Start workers on all **🔹 WORKER** devices: |
| 58 | + |
| 59 | +```sh |
| 60 | +sudo nice -n -20 ./dllama worker --port 9999 --nthreads 4 |
| 61 | +``` |
| 62 | + |
| 63 | +8. Run the inference to test if everything works fine on the **🔸 ROOT** device: |
| 64 | + |
| 65 | +```sh |
| 66 | +sudo nice -n -20 ./dllama inference \ |
| 67 | + --prompt "Hello world" \ |
| 68 | + --steps 32 \ |
| 69 | + --model models/llama3_2_3b_instruct_q40/dllama_model_llama3_2_3b_instruct_q40.m \ |
| 70 | + --tokenizer models/llama3_2_3b_instruct_q40/dllama_tokenizer_llama3_2_3b_instruct_q40.t \ |
| 71 | + --buffer-float-type q80 \ |
| 72 | + --nthreads 4 \ |
| 73 | + --max-seq-len 4096 \ |
| 74 | + --workers 10.0.0.2:9999 10.0.0.3:9999 10.0.0.4:9999 |
| 75 | +``` |
| 76 | + |
| 77 | +9. To run the API server, start it on the **🔸 ROOT** device: |
| 78 | + |
| 79 | +```sh |
| 80 | +sudo nice -n -20 ./dllama-api \ |
| 81 | + --port 9999 \ |
| 82 | + --model models/llama3_2_3b_instruct_q40/dllama_model_llama3_2_3b_instruct_q40.m \ |
| 83 | + --tokenizer models/llama3_2_3b_instruct_q40/dllama_tokenizer_llama3_2_3b_instruct_q40.t \ |
| 84 | + --buffer-float-type q80 \ |
| 85 | + --nthreads 4 \ |
| 86 | + --max-seq-len 4096 \ |
| 87 | + --workers 10.0.0.2:9999 10.0.0.3:9999 10.0.0.4:9999 |
| 88 | +``` |
| 89 | + |
| 90 | +Now you can connect to the API server from your computer: |
| 91 | + |
| 92 | +``` |
| 93 | +http://raspberrypi1.local:9999/v1/models |
| 94 | +``` |
0 commit comments