You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,18 +6,18 @@ weight: 4
6
6
layout: learningpathall
7
7
---
8
8
9
-
### Llama 3.1 model and llama.cpp
9
+
### Llama 3.1 Model and Llama.cpp
10
10
11
11
In this section, you will build and run the `llama.cpp` server program using an OpenAI-compatible API on your AWS Arm-based server instance.
12
12
13
13
The [Llama-3.1-8B model](https://huggingface.co/cognitivecomputations/dolphin-2.9.4-llama3.1-8b-gguf) from Meta belongs to the Llama 3.1 model family and is free to use for research and commercial purposes. Before you use the model, visit the Llama [website](https://llama.meta.com/llama-downloads/) and fill in the form to request access.
14
14
15
-
[llama.cpp](https://github.com/ggerganov/llama.cpp) is an open-source C/C++ project that enables efficient LLM inference on a variety of hardware - both locally, and in the cloud. You can conveniently host a Llama 3.1 model using `llama.cpp`.
15
+
[Llama.cpp](https://github.com/ggerganov/llama.cpp) is an open-source C/C++ project that enables efficient LLM inference on a variety of hardware - both locally, and in the cloud. You can conveniently host a Llama 3.1 model using `llama.cpp`.
16
16
17
17
18
-
### Download and build llama.cpp
18
+
### Download and build Llama.cpp
19
19
20
-
Run the following commands to install make, cmake, gcc, g++, and other essential tools required for building llama.cpp from source:
20
+
Run the following commands to install make, cmake, gcc, g++, and other essential tools required for building Llama.cpp from source:
The GGUF model format, introduced by the llama.cpp team, uses compression and quantization to reduce weight precision to 4-bit integers, significantly decreasing computational and memory demands and making Arm CPUs effective for LLM inference.
67
+
The GGUF model format, introduced by the Llama.cpp team, uses compression and quantization to reduce weight precision to 4-bit integers, significantly decreasing computational and memory demands and making Arm CPUs effective for LLM inference.
68
68
69
69
70
70
### Re-quantize the model weights
@@ -91,10 +91,10 @@ Start the server from the command line, and it listens on port 8080:
91
91
The output from this command should look like:
92
92
93
93
```output
94
-
'main: server is listening on 127.0.0.1:8080 - starting the main loop
94
+
main: server is listening on 127.0.0.1:8080 - starting the main loop
95
95
```
96
96
97
-
You can also adjust the parameters of the launched LLM to adapt it to your server hardware to obtain ideal performance. For more parameter information, see the `llama-server --help` command.
97
+
You can also adjust the parameters of the launched LLM to adapt it to your server hardware to achieve an ideal performance. For more parameter information, see the `llama-server --help` command.
98
98
99
99
You have started the LLM service on your AWS Graviton instance with an Arm-based CPU. In the next section, you will directly interact with the service using the OpenAI SDK.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,29 +7,29 @@ layout: learningpathall
7
7
---
8
8
## Create a dedicated cluster
9
9
10
-
In this section, you will learn how to set up a cluster on Zilliz Cloud.
10
+
In this section, you will set up a cluster on Zilliz Cloud.
11
11
12
12
Begin by [registering](https://docs.zilliz.com/docs/register-with-zilliz-cloud) for a free account on Zilliz Cloud.
13
13
14
-
After you register, [create a cluster](https://docs.zilliz.com/docs/create-cluster) on Zilliz Cloud.
14
+
After you register, [create a cluster](https://docs.zilliz.com/docs/create-cluster).
15
15
16
-
In this Learning Path, you will create a dedicated cluster deployed in AWS using Arm-based machines to store and retrieve the vector data as shown:
16
+
Now create a **Dedicated** cluster deployed in AWS using Arm-based machines to store and retrieve the vector data as shown:
17
17
18
18

19
19
20
-
When you select the **Create Cluster** Button, you should see the cluster running in your Default Project.
20
+
When you select the **Create Cluster** Button, you should see the cluster running in your **Default Project**.
21
21
22
22

23
23
24
24
{{% notice Note %}}
25
-
You can use self-hosted Milvus as an alternative to Zilliz Cloud. This option is more complicated to set up. You can also deploy [Milvus Standalone](https://milvus.io/docs/install_standalone-docker-compose.md) and [Kubernetes](https://milvus.io/docs/install_cluster-milvusoperator.md) on Arm-based machines. For more information about Milvus installation, please refer to the [installation documentation](https://milvus.io/docs/install-overview.md).
25
+
You can use self-hosted Milvus as an alternative to Zilliz Cloud. This option is more complicated to set up. You can also deploy [Milvus Standalone](https://milvus.io/docs/install_standalone-docker-compose.md) and [Kubernetes](https://milvus.io/docs/install_cluster-milvusoperator.md) on Arm-based machines. For more information about installing Milvus, see the [Milvus installation documentation](https://milvus.io/docs/install-overview.md).
26
26
{{% /notice %}}
27
27
28
28
## Create the Collection
29
29
30
-
With the dedicated cluster running in Zilliz Cloud, you are now ready to create a collection in your cluster.
30
+
With the Dedicated cluster running in Zilliz Cloud, you are now ready to create a collection in your cluster.
31
31
32
-
Within your activated python virtual environment `venv`, start by creating a file named `zilliz-llm-rag.py`, and copy the contents below into it:
32
+
Within your activated Python virtual environment `venv`, start by creating a file named `zilliz-llm-rag.py`, and copy the contents below into it:
This code checks if a collection already exists and drops it if it does. If this happens, you can create a new collection with the specified parameters.
61
61
62
-
If you do not specify any field information, Milvus automatically creates a default `id` field for the primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values.
62
+
If you do not specify any field information, Milvus automatically creates a default `id` field for the primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schemadefined fields and their values.
63
63
You can use inner product distance as the default metric type. For more information about distance types, you can refer to [Similarity Metrics page](https://milvus.io/docs/metric.md?tab=floating)
64
64
65
65
You can now prepare the data to use in this collection.
@@ -116,10 +116,10 @@ for i, (line, embedding) in enumerate(
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/milvus-rag/prerequisite.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ RAG applications often use vector databases to efficiently store and retrieve hi
14
14
15
15
In this Learning Path, you will use [Zilliz Cloud](https://zilliz.com/cloud) for your vector storage, which is a fully managed Milvus vector database. Zilliz Cloud is available on major cloud computing service providers; for example, AWS, GCP, and Azure.
16
16
17
-
Specifically, you will use Zilliz Cloud deployed on AWS with Arm-based servers. For the LLM, you will use the Llama-3.1-8B model running on an AWS Arm-based server using `llama.cpp`.
17
+
Here, you will use Zilliz Cloud deployed on AWS with an Arm-based server. For the LLM, you will use the Llama-3.1-8B model also running on an AWS Arm-based server, but using `llama.cpp`.
0 commit comments