You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/rtp-llm/_index.md
+5-4Lines changed: 5 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,17 +1,18 @@
1
1
---
2
-
title: Run a Large Language Model (LLM) chatbot with rtp-llm on Arm servers
2
+
title: Run an LLM chatbot with rtp-llm on Arm-based servers
3
3
4
4
minutes_to_complete: 30
5
5
6
-
who_is_this_for: This is an introductory topic for developers interested in running LLMs on Arm-based servers.
6
+
who_is_this_for: This is an introductory topic for developers who are interested in running a Large Language Model (LLM) with rtp-llm on Arm-based servers.
7
7
8
8
learning_objectives:
9
-
- Build rtp-llm on your Arm server.
9
+
- Build rtp-llm on an Arm-based server.
10
10
- Download a Qwen model from Hugging Face.
11
11
- Run a Large Language Model with rtp-llm.
12
12
13
13
prerequisites:
14
-
- An Arm Neoverse N2 or Neoverse V2 [based instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider or an on-premise Arm server. This Learning Path was tested on an AliCloud Yitian710 g8y.8xlarge instance and an AWS Graviton4 r8g.8xlarge instance to test Arm performance optimizations.
14
+
- Any Arm Neoverse N2-based or Arm Neoverse V2-based instance running Ubuntu 22.04 LTS from a cloud service provider or an on-premise Arm server.
15
+
- For the server, at least four cores and 16GB of RAM, with disk storage configured up to at least 32 GB.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/rtp-llm/_review.md
+14-4Lines changed: 14 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,23 +2,33 @@
2
2
review:
3
3
- questions:
4
4
question: >
5
-
Can you run LLMs on Arm CPUs?
5
+
Are at least four cores, 16GB of RAM, and 32GB of disk storage required to run the LLM chatbot using rtp-llm on an Arm-based server?
6
6
answers:
7
7
- "Yes"
8
8
- "No"
9
9
correct_answer: 1
10
10
explanation: >
11
-
Yes. The advancements made in the Generative AI space with smaller parameter models make LLM inference on CPUs very efficient.
11
+
It depends on the size of the LLM. The higher the number of parameters of the model, the greater the system requirements.
12
12
13
13
- questions:
14
14
question: >
15
-
Can rtp-llm be built and run on CPU?
15
+
Does the rtp-llm project use the --config=arm option to optimize LLM inference for Arm CPUs?
16
16
answers:
17
17
- "Yes"
18
18
- "No"
19
19
correct_answer: 1
20
20
explanation: >
21
-
Yes. rtp-llm not only support built and run on GPU, but also it can be run on Arm CPU.
21
+
rtp-llm uses the GPU for inference by default. rtp-llm optimizes LLM inference on Arm architecture by providing a configuration option --config=arm during the build process.
22
+
23
+
- questions:
24
+
question: >
25
+
Is the given Python script the only way to run the LLM chatbot on an Arm AArch64 CPU and output a response from the model?
26
+
answers:
27
+
- "Yes"
28
+
- "No"
29
+
correct_answer: 2
30
+
explanation: >
31
+
rtp-llm can also be deployed as an API server, and the user can use curl or another client to generate an LLM chatbot response.
Arm CPUs are widely used in ML and AI use cases. In this Learning Path, you will learn how to run the generative AI inference-based use case of an LLM chatbot on an Arm-based CPU. You will do this by deploying the [Qwen2-0.5B-Instruct model](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) on an Arm-based CPU using `rtp-llm`.
9
+
10
+
11
+
{{% notice Note %}}
12
+
This Learning Path has been tested on an Alibaba Cloud g8y.8xlarge instance and an AWS Graviton4 r8g.8xlarge instance.
13
+
{{% /notice %}}
14
+
15
+
16
+
[rtp-llm](https://github.com/alibaba/rtp-llm) is an open-source C/C++ project developed by Alibaba that enables efficient LLM inference on a variety of hardware.
17
+
18
+
RTP-LLM is a Large Language Model inference acceleration engine developed by Alibaba. Qwen is the name given to a series of Large Language Models developed by Alibaba Cloud that are capable of performing a variety of tasks.
19
+
20
+
Alibaba Cloud offer a wide range of models, each suitable for different tasks and use cases.
21
+
22
+
Besides generating text, they are also able to perform actions such as:
23
+
24
+
* Answering questions, through information retrieval, and analysis.
25
+
* Processing images, and producing written descriptions of visual content.
26
+
* Processing audio content.
27
+
* Provide multilingual support, with over 27 additional languages, on top of the core languages of English and Chinese.
28
+
29
+
Qwen is open source, flexible, and encourages contribution from the software development community.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/rtp-llm/rtp-llm-chatbot.md
+15-21Lines changed: 15 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,23 +1,13 @@
1
1
---
2
-
title: Run a Large Language model (LLM) chatbot with rtp-llm on Arm servers
2
+
title: Run an LLM chatbot with rtp-llm on an Arm server
3
3
weight: 3
4
4
5
5
### FIXED, DO NOT MODIFY
6
6
layout: learningpathall
7
7
---
8
-
9
-
## Before you begin
10
-
The instructions in this Learning Path are for any Arm Neoverse N2 or Neoverse V2 based server running Ubuntu 22.04 LTS. You need an Arm server instance with at least four cores and 16GB of RAM to run this example. Configure disk storage up to at least 32 GB. The instructions have been tested on an Alibaba Cloud g8y.8xlarge instance and an AWS Graviton4 r8g.8xlarge instance.
11
-
12
-
## Overview
13
-
14
-
Arm CPUs are widely used in traditional ML and AI use cases. In this Learning Path, you will learn how to run generative AI inference-based use case like a LLM chatbot on Arm-based CPUs. You do this by deploying the [Qwen2-0.5B-Instruct model](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) on your Arm-based CPU using `rtp-llm`.
15
-
16
-
[rtp-llm](https://github.com/alibaba/rtp-llm) is an open source C/C++ project developed by Alibaba that enables efficient LLM inference on a variety of hardware.
17
-
18
8
## Install dependencies
19
9
20
-
Install `micromamba` to setup python 3.10 at path `/opt/conda310`, required by `rtp-llm` build system:
10
+
Install `micromamba` to set up python 3.10 at path `/opt/conda310`, as required by the`rtp-llm` build system:
21
11
22
12
```bash
23
13
"${SHELL}"<(curl -L micro.mamba.pm/install.sh)
@@ -34,14 +24,14 @@ chmod +x bazelisk-linux-arm64
34
24
sudo mv bazelisk-linux-arm64 /usr/bin/bazelisk
35
25
```
36
26
37
-
Install `git/gcc/g++` on your machine:
27
+
Install `git/gcc/g++`:
38
28
39
29
```bash
40
30
sudo apt install git -y
41
31
sudo apt install build-essential -y
42
32
```
43
33
44
-
Install `openblas`developmwnt package and fix the header paths:
34
+
Install the `openblas`development package and fix the header paths:
Start by cloning the source repository for rtp-llm:
57
47
58
48
```bash
59
49
git clone https://github.com/alibaba/rtp-llm
60
50
cd rtp-llm
61
51
git checkout 4656265
62
52
```
63
53
64
-
Comment out the lines 7-10 in `deps/requirements_lock_torch_arm.txt` as some hosts are not accessible from the Internet.
54
+
Next, comment out lines 7-10 in `deps/requirements_lock_torch_arm.txt` as some hosts are not accessible from the web:
65
55
66
56
```bash
67
57
sed -i '7,10 s/^/#/' deps/requirements_lock_torch_arm.txt
68
58
```
69
59
70
-
By default, `rtp-llm` builds for GPU only on Linux. You need to provide extra config `--config=arm` to build it for the Arm CPU that you will run it on:
60
+
By default, `rtp-llm` builds for GPU only on Linux. You need to provide the additional flag `--config=arm` to build it for the Arm CPU that you will run it on.
Create a file named `python-test.py` in your `/tmp` directory with the contents below:
80
+
Create a file named `python-test.py` in your `/tmp` directory with the contents shown below:
91
81
92
82
```python
93
83
from maga_transformer.pipeline import Pipeline
@@ -140,7 +130,9 @@ Now run this file:
140
130
python /tmp/python-test.py
141
131
```
142
132
143
-
If `rtp-llm` has built correctly on your machine, you will see the LLM model response for the prompt input. A snippet of the output is shown below:
133
+
If `rtp-llm` has built correctly on your machine, you will see the LLM model response for the prompt input.
134
+
135
+
A snippet of the output is shown below:
144
136
145
137
```output
146
138
['I am a large language model created by Alibaba Cloud. My name is Qwen.']
@@ -174,5 +166,7 @@ If `rtp-llm` has built correctly on your machine, you will see the LLM model res
174
166
```
175
167
176
168
177
-
You have successfully run a LLM chatbot with Arm optimizations, all running on your Arm AArch64 CPU on your server. You can continue experimenting and trying out the model with different prompts.
169
+
You have successfully run a LLM chatbot with Arm optimizations, running on an Arm AArch64 CPU on your server.
170
+
171
+
You can continue to experiment with the chatbot by trying out different prompts on the model.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/rtp-llm/rtp-llm-server.md
+31-19Lines changed: 31 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,25 +5,32 @@ weight: 4
5
5
### FIXED, DO NOT MODIFY
6
6
layout: learningpathall
7
7
---
8
+
## Setup
8
9
9
-
You can use the `rtp-llm` server program and submit requests using an OpenAI-compatible API.
10
-
This enables applications to be created which access the LLM multiple times without starting and stopping it. You can also access the server over the network to another machine hosting the LLM.
10
+
You can now move on to using the `rtp-llm` server program and submitting requests using an OpenAI-compatible API.
11
11
12
-
One additional software package is required for this section. Install `jq` on your computer using:
12
+
This enables applications to be created which access the LLM multiple times without starting and stopping it.
13
+
14
+
You can also access the server over the network to another machine hosting the LLM.
15
+
16
+
One additional software package is required for this section.
17
+
18
+
Install `jq` on your computer using the following commands:
13
19
14
20
```bash
15
21
sudo apt install jq -y
16
22
```
17
23
18
-
# Running the Server
19
-
## Install Hugging Face Hub
24
+
## Running the Server
20
25
21
-
There are a few different ways you can download the Qwen2 0.5B model. In this Learning Path, you download the model from Hugging Face.
26
+
There are a few different ways you can download the Qwen2 0.5B model. In this Learning Path, you will download the model from Hugging Face.
22
27
23
-
[Hugging Face](https://huggingface.co/) is an open source AI community where you can host your own AI models, train them and collaborate with others in the community. You can browse through the thousands of models that are available for a variety of use cases like NLP, audio, and computer vision.
28
+
[Hugging Face](https://huggingface.co/) is an open source AI community where you can host your own AI models, train them, and collaborate with others in the community. You can browse through thousands of models that are available for a variety of use cases such as Natural Language Processing (NLP), audio, and computer vision.
24
29
25
30
The `huggingface_hub` library provides APIs and tools that let you easily download and fine-tune pre-trained models. You will use `huggingface-cli` to download the [Qwen2 0.5B model](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct).
26
31
32
+
## Install Hugging Face Hub
33
+
27
34
Install the required Python packages:
28
35
29
36
```bash
@@ -51,14 +58,18 @@ You can now download the model using the huggingface cli:
51
58
huggingface-cli download Qwen/Qwen2-0.5B-Instruct
52
59
```
53
60
54
-
## Start rtp-llm server
55
-
The server executable has already compiled during the stage detailed in the previous section, when you ran `bazelisk build`. Install the pip wheel in your active virtual environment:
61
+
## Start the rtp-llm server
62
+
63
+
{{% notice Note %}}
64
+
The server executable compiled during the previous stage, when you ran `bazelisk build`. {{% /notice %}}
65
+
66
+
Install the pip wheel in your active virtual environment:
Run the Python file (make sure the server is still running):
180
+
Ensure that the server is still running, and then run the Python file:
169
181
170
182
```bash
171
183
python ./python-test.py
172
184
```
173
185
174
-
You see the output generated by the LLM:
186
+
You should see the output generated by the LLM:
175
187
176
188
```output
177
189
Sure, here's a simple C++ program that prints "Hello, World!" to the console:
@@ -187,4 +199,4 @@ int main() {
187
199
This program includes the `iostream` library, which is used for input/output operations. The `main` function is the entry point of the program, and it calls the `cout` object to print the message "Hello, World!" to the console.
188
200
```
189
201
190
-
You can continue to experiment with different large language models and write scripts to access them.
202
+
Now you can continue to experiment with different large language models, and have a go at writing scripts to access them.
0 commit comments