forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
Note: This issue was copied from ggml-org#2164
Original Author: @ggerganov
Original Issue Number: ggml-org#2164
Created: 2023-07-10T16:12:22Z
Now that distributed inference is supported thanks to the work of @evanmiller in ggml-org#2099 it would be fun to try to utilize it for something cool. One such idea is to connect a bunch of Raspberry Pis in a local network and run the inference using MPI:
# sample cluster of 8 devices (replace with actual IP addresses of the devices)
$ cat ./hostfile
192.168.0.1:1
192.168.0.2:1
192.168.0.3:1
192.168.0.4:1
192.168.0.5:1
192.168.0.6:1
192.168.0.7:1
192.168.0.8:1
# build with MPI support
$ make CC=mpicc CXX=mpicxx LLAMA_MPI=1 -j
# run distributed inference over 8 nodes
$ mpirun -hostfile ./hostfile -n 8 ./main -m /mnt/models/65B/ggml-model-q4_0.bin -p "I believe the meaning of life is" -n 64
Here we assume that the 65B model data is located on a network share in /mnt
and that mmap
works over a network share.
Not sure if that is the case - if not, then it would be more difficult to perform this experiment.
Looking for people with access to the necessary hardware to perform this experiment