Skip to content

Commit 5cd1270

Browse files
Aryan BhusariAryan Bhusari
authored andcommitted
Fixing some minor issues
1 parent 0b19a2e commit 5cd1270

File tree

2 files changed

+8
-15
lines changed
  • content/learning-paths/servers-and-cloud-computing/distributed-inference-with-llama-cpp

2 files changed

+8
-15
lines changed

content/learning-paths/servers-and-cloud-computing/distributed-inference-with-llama-cpp/_index.md

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -35,14 +35,6 @@ further_reading:
3535
title: Llama.cpp rpc-server code
3636
link: https://github.com/ggml-org/llama.cpp/tree/master/tools/rpc
3737
type: Code
38-
- resource:
39-
title: PLACEHOLDER BLOG
40-
link: PLACEHOLDER BLOG LINK
41-
type: blog
42-
- resource:
43-
title: PLACEHOLDER GENERAL WEBSITE
44-
link: PLACEHOLDER GENERAL WEBSITE LINK
45-
type: website
4638

4739

4840

content/learning-paths/servers-and-cloud-computing/distributed-inference-with-llama-cpp/how-to-2.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,32 +6,33 @@ weight: 3
66
layout: learningpathall
77
---
88

9-
## PLACEHOLDER HEADER OF SECOND STEP
9+
(continued)<br>
1010
4. In this learning path, we will use the following three IP addresses for the nodes.
1111

1212
```bash
13-
172.31.110.10 (Master)
14-
172.31.110.11, 172.31.110.12 (Workers)
13+
master_ip =" 172.31.110.10"
14+
worker_ips = "172.31.110.11,172.31.110.12"
1515
```
1616
Note that these IPs may be different in your setup. You can find the IP address of your AWS instance using the command provided below.
1717
```bash
1818
curl http://169.254.169.254/latest/meta-data/local-ipv4
1919
```
2020

21-
Now, on the master node, you can verify communication with the worker nodes using the following command:
21+
Now, on the master node, you can verify communication with the worker nodes using the following command on master node:
2222
```bash
2323
telnet 172.31.110.11 50052
2424
```
2525
If the backend server is set up correctly, the output of the `telnet` command should look like the following:
2626
```bash
2727
Trying 172.31.110.11...
28-
Connected to localhost.
28+
Connected to 172.31.110.11.
2929
Escape character is '^]'.
3030
```
3131
Finally, you can execute the following command, to execute distributed inference:
3232
```bash
33-
bin/llama-cli -m /home/ubuntu/model.gguf -p "Tell me a joke" -n 128 --rpc 172.31.110.11:50052,172.31.110.12:50052 -ngl 99
33+
bin/llama-cli -m /home/ubuntu/model.gguf -p "Tell me a joke" -n 128 --rpc "$worker_ips" -ngl 99
3434
```
35+
{{% notice Note %}}At the time of publication, llama.cpp only supports up to 16 backend workers.{{% /notice %}} <br>
3536
The model file for this experiment is hosted on Arm’s private AWS S3 bucket. If you don’t have access to it, you can find a publicly available version of the model on Hugging Face.
3637
The output:
3738
```output
@@ -201,7 +202,7 @@ That's it! You have sucessfully run the llama-3.1-8B model on CPUs with the powe
201202

202203
Lastly to set up OpenAI compatible API, you can use the `llama-server` functionality. The process of implementing this is described [here](/learning-paths/servers-and-cloud-computing/llama-cpu) under the "Access the chatbot using the OpenAI-compatible API" section. Here is a snippet, for how to set up llama-server for disributed inference:
203204
```bash
204-
bin/llama-server -m /home/ubuntu/model.gguf --port 8080 --rpc 172.31.110.11:50052,172.31.110.12:50052 -ngl 99
205+
bin/llama-server -m /home/ubuntu/model.gguf --port 8080 --rpc "$worker_ips" -ngl 99
205206
```
206207
At the very end of the output to the above command, you will see somethin like the following:
207208
```output

0 commit comments

Comments
 (0)