Skip to content

Commit 66a3f3d

Browse files
authored
Merge pull request #1318 from pareenaverma/content_review
Reviewed and updated the Zilliz/Milvus RAG LP
2 parents f1b7de7 + 37676d3 commit 66a3f3d

File tree

9 files changed

+129
-80
lines changed

9 files changed

+129
-80
lines changed

content/learning-paths/servers-and-cloud-computing/milvus-rag/_index.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
11
---
2-
title: Use Milvus/Zilliz to build RAG on Arm Architecture
2+
title: Build a Retrieval-Augmented Generation (RAG) application using Zilliz Cloud on Arm servers
33

44
minutes_to_complete: 20
55

6-
who_is_this_for: This is an introductory topic for engineers who want to create a RAG application on Arm machines.
6+
who_is_this_for: This is an introductory topic for software developers who want to create a RAG application on Arm servers.
77

88
learning_objectives:
9-
- Create a simple RAG application using Milvus/Zilliz
10-
- Launch LLM service on Arm machines
9+
- Create a simple RAG application using Zilliz Cloud
10+
- Launch a LLM service on Arm servers
1111

1212
prerequisites:
13-
- Basic understand of RAG pipeline.
14-
- An [AWS account](/learning-paths/servers-and-cloud-computing/csp/aws/) to access instance types with different AWS Graviton processors.
15-
- A [Zilliz account](https://zilliz.com/cloud), which you can sign up for a free trial.
13+
- Basic understanding of a RAG pipeline.
14+
- An AWS Graviton3 c7g.2xlarge instance, or any [Arm based instance](/learning-paths/servers-and-cloud-computing/csp) from a cloud service provider or an on-premise Arm server.
15+
- A [Zilliz account](https://zilliz.com/cloud), which you can sign up for with a free trial.
1616

1717
author_primary: Chen Zhang
1818

@@ -23,6 +23,8 @@ armips:
2323
- Cortex-A
2424
tools_software_languages:
2525
- Python
26+
- GenAI
27+
- RAG
2628
operatingsystems:
2729
- Linux
2830

content/learning-paths/servers-and-cloud-computing/milvus-rag/_next-steps.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
next_step_guidance: Thank you for completing the Milvus RAG tutorial.
2+
next_step_guidance: Thank you for completing the RAG with Zilliz Cloud Learning Path. You might be interested in learning how to run the Llama 3.1 8B model with KleidiAI optimizations on Arm servers.
33

44
recommended_path: /learning-paths/servers-and-cloud-computing/llama-cpu/
55

content/learning-paths/servers-and-cloud-computing/milvus-rag/_review.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
review:
33
- questions:
44
question: >
5-
Can Milvus run on Arm systems?
5+
Can Milvus run on Arm?
66
answers:
77
- "Yes"
88
- "No"
@@ -12,13 +12,23 @@ review:
1212
1313
- questions:
1414
question: >
15-
Can Llama3.1 model run on Arm systems?
15+
Can Llama3.1 model run on Arm?
1616
answers:
1717
- "Yes"
1818
- "No"
1919
correct_answer: 1
2020
explanation: >
21-
The Llama-3.1-8B model from Meta can be used on an AWS Arm-based server CPU with the llama.cpp tool.
21+
The Llama-3.1-8B model from Meta can be used on Arm-based servers with llama.cpp.
22+
23+
- questions:
24+
question: >
25+
Which of the following is true about about Zilliz Cloud?
26+
answers:
27+
- "It is a fully-managed version of Milvus vector database"
28+
- "It is a self-hosted version of Milvus vector database"
29+
correct_answer: 1
30+
explanation: >
31+
Zilliz Cloud is a fully-managed version of Milvus.
2232
2333
2434
180 KB
Loading

content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
2-
title: Launch LLM Service on Arm
2+
title: Launch LLM Server
33
weight: 4
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
In this section, we will build and launch the `llama.cpp` service on the Arm-based CPU.
9+
In this section, you will build and run the `llama.cpp` server program using an OpenAI-compatible API on your running AWS Arm-based server instance.
1010

1111
### Llama 3.1 model & llama.cpp
1212

@@ -69,32 +69,33 @@ The GGUF model format, introduced by the llama.cpp team, uses compression and qu
6969

7070
### Re-quantize the model weights
7171

72-
To re-quantize, run
72+
To re-quantize the model, run:
7373

7474
```bash
7575
./llama-quantize --allow-requantize dolphin-2.9.4-llama3.1-8b-Q4_0.gguf dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf Q4_0_8_8
7676
```
7777

7878
This will output a new file, `dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf`, which contains reconfigured weights that allow `llama-cli` to use SVE 256 and MATMUL_INT8 support.
7979

80-
> This requantization is optimal specifically for Graviton3. For Graviton2, the optimal requantization should be performed in the `Q4_0_4_4` format, and for Graviton4, the `Q4_0_4_8` format is the most suitable for requantization.
80+
This requantization is optimal specifically for Graviton3. For Graviton2, the optimal requantization should be performed in the `Q4_0_4_4` format, and for Graviton4, the `Q4_0_4_8` format is the most suitable for requantization.
8181

82-
### Start the LLM Service
83-
You can utilize the llama.cpp server program and send requests via an OpenAI-compatible API. This allows you to develop applications that interact with the LLM multiple times without having to repeatedly start and stop it. Additionally, you can access the server from another machine where the LLM is hosted over the network.
82+
### Start the LLM Server
83+
You can utilize the `llama.cpp` server program and send requests via an OpenAI-compatible API. This allows you to develop applications that interact with the LLM multiple times without having to repeatedly start and stop it. Additionally, you can access the server from another machine where the LLM is hosted over the network.
8484

8585
Start the server from the command line, and it listens on port 8080:
8686

87-
```shell
87+
```bash
8888
./llama-server -m dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf -n 2048 -t 64 -c 65536 --port 8080
8989
```
90-
```text
90+
91+
The output from this command should look like:
92+
93+
```output
9194
'main: server is listening on 127.0.0.1:8080 - starting the main loop
9295
```
9396

9497
You can also adjust the parameters of the launched LLM to adapt it to your server hardware to obtain ideal performance. For more parameter information, see the `llama-server --help` command.
9598

96-
If you struggle to perform this step, you can refer to the [this documents](https://learn.arm.com/learning-paths/servers-and-cloud-computing/llama-cpu/llama-chatbot/) for more information.
97-
98-
You have started the LLM service on your Arm-based CPU. Next, we directly interact with the service using the OpenAI SDK.
99+
You have started the LLM service on your AWS Graviton instance with an Arm-based CPU. In the next section, you will directly interact with the service using the OpenAI SDK.
99100

100101

content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md

Lines changed: 52 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -6,45 +6,66 @@ weight: 3
66
layout: learningpathall
77
---
88

9-
In this section, we will show you how to load private knowledge in our RAG.
9+
In this section, you will learn how to setup a cluster on Zilliz Cloud. You will then learn how to load your private knowledge database into the cluster.
10+
11+
### Create a dedicated cluster
12+
13+
You will need to [register](https://docs.zilliz.com/docs/register-with-zilliz-cloud) for a free account on Zilliz Cloud.
14+
15+
After you register, [create a cluster](https://docs.zilliz.com/docs/create-cluster) on Zilliz Cloud. In this Learning Path, you will create a dedicated cluster deployed in AWS using Arm-based machines to store and retreive the vector data as shown:
16+
17+
![cluster](create_cluster.png)
18+
19+
When you select the `Create Cluster` Button, you should see the cluster running in your Default Project.
20+
21+
![running](running_cluster.png)
22+
23+
{{% notice Note %}}
24+
You can use self-hosted Milvus as an alternative to Zilliz Cloud. This option is more complicated to set up. We can also deploy [Milvus Standalone](https://milvus.io/docs/install_standalone-docker-compose.md) and [Kubernetes](https://milvus.io/docs/install_cluster-milvusoperator.md) on Arm-based machines. For more information about Milvus installation, please refer to the [installation documentation](https://milvus.io/docs/install-overview.md).
25+
{{% /notice %}}
1026

1127
### Create the Collection
12-
We use [Zilliz Cloud](https://zilliz.com/cloud) deployed on AWS with Arm-based machines to store and retrieve the vector data. To quick start, simply [register an account](https://docs.zilliz.com/docs/register-with-zilliz-cloud) on Zilliz Cloud for free.
1328

14-
> In addition to Zilliz Cloud, self-hosted Milvus is also a (more complicated to set up) option. We can also deploy [Milvus Standalone](https://milvus.io/docs/install_standalone-docker-compose.md) and [Kubernetes](https://milvus.io/docs/install_cluster-milvusoperator.md) on ARM-based machines. For more information about Milvus installation, please refer to the [installation documentation](https://milvus.io/docs/install-overview.md).
29+
With the dedicated cluster running in Zilliz Cloud, you are now ready to create a collection in your cluster.
30+
31+
Within your activated python `venv`, start by creating a file named `zilliz-llm-rag.py` and copy the contents below into it:
1532

16-
We set the `uri` and `token` as the [Public Endpoint and Api key](https://docs.zilliz.com/docs/on-zilliz-cloud-console#free-cluster-details) in Zilliz Cloud.
1733
```python
1834
from pymilvus import MilvusClient
1935

2036
milvus_client = MilvusClient(
2137
uri="<your_zilliz_public_endpoint>", token="<your_zilliz_api_key>"
2238
)
2339

24-
collection_name = "my_rag_collection"
25-
2640
```
27-
Check if the collection already exists and drop it if it does.
41+
Replace <your_zilliz_public_endpoint> and <your zilliz_api_key> with the `URI` and `Token` for your running cluster. Refer to [Public Endpoint and Api key](https://docs.zilliz.com/docs/on-zilliz-cloud-console#free-cluster-details) in Zilliz Cloud for more details.
42+
43+
Now, append the following code to `zilliz-llm-rag.py` and save the contents:
44+
2845
```python
46+
collection_name = "my_rag_collection"
47+
embedding_dim = "384"
48+
2949
if milvus_client.has_collection(collection_name):
3050
milvus_client.drop_collection(collection_name)
31-
```
32-
Create a new collection with specified parameters.
3351

34-
If we don't specify any field information, Milvus will automatically create a default `id` field for primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values.
35-
```python
3652
milvus_client.create_collection(
3753
collection_name=collection_name,
3854
dimension=embedding_dim,
3955
metric_type="IP", # Inner product distance
4056
consistency_level="Strong", # Strong consistency level
4157
)
4258
```
43-
We use inner product distance as the default metric type. For more information about distance types, you can refer to [Similarity Metrics page](https://milvus.io/docs/metric.md?tab=floating)
59+
This code checks if a collection already exists and drops it if it does. You then, create a new collection with the specified parameters.
60+
61+
If you don't specify any field information, Milvus will automatically create a default `id` field for primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values.
62+
You will use inner product distance as the default metric type. For more information about distance types, you can refer to [Similarity Metrics page](https://milvus.io/docs/metric.md?tab=floating)
63+
64+
You can now prepare the data to use in this collection.
4465

4566
### Prepare the data
4667

47-
We use the FAQ pages from the [Milvus Documentation 2.4.x](https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/milvus_docs_2.4.x_en.zip) as the private knowledge in our RAG, which is a good data source for a simple RAG pipeline.
68+
In this example, you will use the FAQ pages from the [Milvus Documentation 2.4.x](https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/milvus_docs_2.4.x_en.zip) as the private knowledge that is loaded in your RAG dataset/collection.
4869

4970
Download the zip file and extract documents to the folder `milvus_docs`.
5071

@@ -53,8 +74,9 @@ wget https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/m
5374
unzip -q milvus_docs_2.4.x_en.zip -d milvus_docs
5475
```
5576

56-
We load all markdown files from the folder `milvus_docs/en/faq`. For each document, we just simply use "# " to separate the content in the file, which can roughly separate the content of each main part of the markdown file.
77+
You will load all the markdown files from the folder `milvus_docs/en/faq` into your data collection. For each document, use "# " to separate the content in the file, which can roughly separate the content of each main part of the markdown file.
5778

79+
Open `zilliz-llm-rag.py` and append the following code to it:
5880

5981
```python
6082
from glob import glob
@@ -69,17 +91,17 @@ for file_path in glob("milvus_docs/en/faq/*.md", recursive=True):
6991
```
7092

7193
### Insert data
72-
We prepare a simple but efficient embedding model [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) that can convert text into embedding vectors.
94+
You will now prepare a simple but efficient embedding model [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) that can convert the loaded text into embedding vectors.
95+
96+
You will iterate through the text lines, create embeddings, and then insert the data into Milvus.
97+
98+
Append and save the code shown below into `zilliz-llm-rag.py`:
99+
73100
```python
74101
from langchain_huggingface import HuggingFaceEmbeddings
75102

76103
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
77-
```
78104

79-
Iterate through the text lines, create embeddings, and then insert the data into Milvus.
80-
81-
Here is a new field `text`, which is a non-defined field in the collection schema. It will be automatically added to the reserved JSON dynamic field, which can be treated as a normal field at a high level.
82-
```python
83105
from tqdm import tqdm
84106

85107
data = []
@@ -93,6 +115,14 @@ for i, (line, embedding) in enumerate(
93115

94116
milvus_client.insert(collection_name=collection_name, data=data)
95117
```
96-
```text
97-
Creating embeddings: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 72/72 [00:18<00:00, 3.91it/s]
118+
Run the python script, to check that you have successfully created the embeddings on the data you loaded into the RAG collection:
119+
120+
```bash
121+
python3 python3 zilliz-llm-rag.py
98122
```
123+
124+
The output should look like:
125+
```
126+
Creating embeddings: 72it [00:00, 700672.59it/s]
127+
```
128+

0 commit comments

Comments
 (0)