Skip to content

Commit 46ea401

Browse files
Editorial
1 parent d875802 commit 46ea401

File tree

3 files changed

+19
-19
lines changed

3 files changed

+19
-19
lines changed

content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,18 +6,18 @@ weight: 4
66
layout: learningpathall
77
---
88

9-
### Llama 3.1 model and llama.cpp
9+
### Llama 3.1 Model and Llama.cpp
1010

1111
In this section, you will build and run the `llama.cpp` server program using an OpenAI-compatible API on your AWS Arm-based server instance.
1212

1313
The [Llama-3.1-8B model](https://huggingface.co/cognitivecomputations/dolphin-2.9.4-llama3.1-8b-gguf) from Meta belongs to the Llama 3.1 model family and is free to use for research and commercial purposes. Before you use the model, visit the Llama [website](https://llama.meta.com/llama-downloads/) and fill in the form to request access.
1414

15-
[llama.cpp](https://github.com/ggerganov/llama.cpp) is an open-source C/C++ project that enables efficient LLM inference on a variety of hardware - both locally, and in the cloud. You can conveniently host a Llama 3.1 model using `llama.cpp`.
15+
[Llama.cpp](https://github.com/ggerganov/llama.cpp) is an open-source C/C++ project that enables efficient LLM inference on a variety of hardware - both locally, and in the cloud. You can conveniently host a Llama 3.1 model using `llama.cpp`.
1616

1717

18-
### Download and build llama.cpp
18+
### Download and build Llama.cpp
1919

20-
Run the following commands to install make, cmake, gcc, g++, and other essential tools required for building llama.cpp from source:
20+
Run the following commands to install make, cmake, gcc, g++, and other essential tools required for building Llama.cpp from source:
2121

2222
```bash
2323
sudo apt install make cmake -y
@@ -27,7 +27,7 @@ sudo apt install build-essential -y
2727

2828
You are now ready to start building `llama.cpp`.
2929

30-
Clone the source repository for llama.cpp:
30+
Clone the source repository for Llama.cpp:
3131

3232
```bash
3333
git clone https://github.com/ggerganov/llama.cpp
@@ -64,7 +64,7 @@ You can now download the model using the huggingface cli:
6464
```bash
6565
huggingface-cli download cognitivecomputations/dolphin-2.9.4-llama3.1-8b-gguf dolphin-2.9.4-llama3.1-8b-Q4_0.gguf --local-dir . --local-dir-use-symlinks False
6666
```
67-
The GGUF model format, introduced by the llama.cpp team, uses compression and quantization to reduce weight precision to 4-bit integers, significantly decreasing computational and memory demands and making Arm CPUs effective for LLM inference.
67+
The GGUF model format, introduced by the Llama.cpp team, uses compression and quantization to reduce weight precision to 4-bit integers, significantly decreasing computational and memory demands and making Arm CPUs effective for LLM inference.
6868

6969

7070
### Re-quantize the model weights
@@ -91,10 +91,10 @@ Start the server from the command line, and it listens on port 8080:
9191
The output from this command should look like:
9292

9393
```output
94-
'main: server is listening on 127.0.0.1:8080 - starting the main loop
94+
main: server is listening on 127.0.0.1:8080 - starting the main loop
9595
```
9696

97-
You can also adjust the parameters of the launched LLM to adapt it to your server hardware to obtain ideal performance. For more parameter information, see the `llama-server --help` command.
97+
You can also adjust the parameters of the launched LLM to adapt it to your server hardware to achieve an ideal performance. For more parameter information, see the `llama-server --help` command.
9898

9999
You have started the LLM service on your AWS Graviton instance with an Arm-based CPU. In the next section, you will directly interact with the service using the OpenAI SDK.
100100

content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,29 +7,29 @@ layout: learningpathall
77
---
88
## Create a dedicated cluster
99

10-
In this section, you will learn how to set up a cluster on Zilliz Cloud.
10+
In this section, you will set up a cluster on Zilliz Cloud.
1111

1212
Begin by [registering](https://docs.zilliz.com/docs/register-with-zilliz-cloud) for a free account on Zilliz Cloud.
1313

14-
After you register, [create a cluster](https://docs.zilliz.com/docs/create-cluster) on Zilliz Cloud.
14+
After you register, [create a cluster](https://docs.zilliz.com/docs/create-cluster).
1515

16-
In this Learning Path, you will create a dedicated cluster deployed in AWS using Arm-based machines to store and retrieve the vector data as shown:
16+
Now create a **Dedicated** cluster deployed in AWS using Arm-based machines to store and retrieve the vector data as shown:
1717

1818
![cluster](create_cluster.png)
1919

20-
When you select the **Create Cluster** Button, you should see the cluster running in your Default Project.
20+
When you select the **Create Cluster** Button, you should see the cluster running in your **Default Project**.
2121

2222
![running](running_cluster.png)
2323

2424
{{% notice Note %}}
25-
You can use self-hosted Milvus as an alternative to Zilliz Cloud. This option is more complicated to set up. You can also deploy [Milvus Standalone](https://milvus.io/docs/install_standalone-docker-compose.md) and [Kubernetes](https://milvus.io/docs/install_cluster-milvusoperator.md) on Arm-based machines. For more information about Milvus installation, please refer to the [installation documentation](https://milvus.io/docs/install-overview.md).
25+
You can use self-hosted Milvus as an alternative to Zilliz Cloud. This option is more complicated to set up. You can also deploy [Milvus Standalone](https://milvus.io/docs/install_standalone-docker-compose.md) and [Kubernetes](https://milvus.io/docs/install_cluster-milvusoperator.md) on Arm-based machines. For more information about installing Milvus, see the [Milvus installation documentation](https://milvus.io/docs/install-overview.md).
2626
{{% /notice %}}
2727

2828
## Create the Collection
2929

30-
With the dedicated cluster running in Zilliz Cloud, you are now ready to create a collection in your cluster.
30+
With the Dedicated cluster running in Zilliz Cloud, you are now ready to create a collection in your cluster.
3131

32-
Within your activated python virtual environment `venv`, start by creating a file named `zilliz-llm-rag.py`, and copy the contents below into it:
32+
Within your activated Python virtual environment `venv`, start by creating a file named `zilliz-llm-rag.py`, and copy the contents below into it:
3333

3434
```python
3535
from pymilvus import MilvusClient
@@ -59,7 +59,7 @@ milvus_client.create_collection(
5959
```
6060
This code checks if a collection already exists and drops it if it does. If this happens, you can create a new collection with the specified parameters.
6161

62-
If you do not specify any field information, Milvus automatically creates a default `id` field for the primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values.
62+
If you do not specify any field information, Milvus automatically creates a default `id` field for the primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema defined fields and their values.
6363
You can use inner product distance as the default metric type. For more information about distance types, you can refer to [Similarity Metrics page](https://milvus.io/docs/metric.md?tab=floating)
6464

6565
You can now prepare the data to use in this collection.
@@ -116,10 +116,10 @@ for i, (line, embedding) in enumerate(
116116

117117
milvus_client.insert(collection_name=collection_name, data=data)
118118
```
119-
Run the python script, to check that you have successfully created the embeddings on the data you loaded into the RAG collection:
119+
Run the Python script, to check that you have successfully created the embeddings on the data you loaded into the RAG collection:
120120

121121
```bash
122-
python3 python3 zilliz-llm-rag.py
122+
python3 zilliz-llm-rag.py
123123
```
124124

125125
The output should look like:

content/learning-paths/servers-and-cloud-computing/milvus-rag/prerequisite.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ RAG applications often use vector databases to efficiently store and retrieve hi
1414

1515
In this Learning Path, you will use [Zilliz Cloud](https://zilliz.com/cloud) for your vector storage, which is a fully managed Milvus vector database. Zilliz Cloud is available on major cloud computing service providers; for example, AWS, GCP, and Azure.
1616

17-
Specifically, you will use Zilliz Cloud deployed on AWS with Arm-based servers. For the LLM, you will use the Llama-3.1-8B model running on an AWS Arm-based server using `llama.cpp`.
17+
Here, you will use Zilliz Cloud deployed on AWS with an Arm-based server. For the LLM, you will use the Llama-3.1-8B model also running on an AWS Arm-based server, but using `llama.cpp`.
1818

1919

2020
## Install dependencies

0 commit comments

Comments
 (0)