Skip to content

Commit fccb522

Browse files
Editorial
1 parent 46ea401 commit fccb522

File tree

3 files changed

+8
-8
lines changed

3 files changed

+8
-8
lines changed

content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Launch LLM Server
2+
title: Launch the LLM Server
33
weight: 4
44

55
### FIXED, DO NOT MODIFY
@@ -67,20 +67,20 @@ huggingface-cli download cognitivecomputations/dolphin-2.9.4-llama3.1-8b-gguf do
6767
The GGUF model format, introduced by the Llama.cpp team, uses compression and quantization to reduce weight precision to 4-bit integers, significantly decreasing computational and memory demands and making Arm CPUs effective for LLM inference.
6868

6969

70-
### Re-quantize the model weights
70+
### Requantize the model weights
7171

72-
To re-quantize the model, run:
72+
To requantize the model, run:
7373

7474
```bash
7575
./llama-quantize --allow-requantize dolphin-2.9.4-llama3.1-8b-Q4_0.gguf dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf Q4_0_8_8
7676
```
7777

78-
This will output a new file, `dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf`, which contains reconfigured weights that allow `llama-cli` to use SVE 256 and MATMUL_INT8 support.
78+
This outputs a new file, `dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf`, which contains reconfigured weights that allow `llama-cli` to use SVE 256 and MATMUL_INT8 support.
7979

8080
This requantization is optimal specifically for Graviton3. For Graviton2, the optimal requantization should be performed in the `Q4_0_4_4` format, and for Graviton4, the `Q4_0_4_8` format is the most suitable for requantization.
8181

8282
### Start the LLM Server
83-
You can utilize the `llama.cpp` server program and send requests via an OpenAI-compatible API. This allows you to develop applications that interact with the LLM multiple times without having to repeatedly start and stop it. Additionally, you can access the server from another machine where the LLM is hosted over the network.
83+
You can utilize the `llama.cpp` server program and send requests through an OpenAI-compatible API. This allows you to develop applications that interact with the LLM multiple times without having to repeatedly start and stop it. Additionally, you can access the server from another machine where the LLM is hosted over the network.
8484

8585
Start the server from the command line, and it listens on port 8080:
8686

content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ milvus_client.create_collection(
6060
This code checks if a collection already exists and drops it if it does. If this happens, you can create a new collection with the specified parameters.
6161

6262
If you do not specify any field information, Milvus automatically creates a default `id` field for the primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema defined fields and their values.
63-
You can use inner product distance as the default metric type. For more information about distance types, you can refer to [Similarity Metrics page](https://milvus.io/docs/metric.md?tab=floating)
63+
You can use inner product distance as the default metric type. For more information about distance types, you can refer to [Similarity Metrics page](https://milvus.io/docs/metric.md?tab=floating).
6464

6565
You can now prepare the data to use in this collection.
6666

@@ -75,7 +75,7 @@ wget https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/m
7575
unzip -q milvus_docs_2.4.x_en.zip -d milvus_docs
7676
```
7777

78-
Now load all the markdown files from the folder `milvus_docs/en/faq` into your data collection. For each document, use "# " to separate the content in the file, which can separate the content of each main part of the markdown file.
78+
Now load all the markdown files from the folder `milvus_docs/en/faq` into your data collection. For each document, use "# " to separate the content in the file. This divides the content of each main part of the markdown file.
7979

8080
Open `zilliz-llm-rag.py` and append the following code to it:
8181

content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ You are now ready to use the LLM and obtain a RAG response.
7676

7777
For the LLM, you will use the OpenAI SDK to request the Llama service you launched in the previous section. You do not need to use an API key because it is running locally on your machine.
7878

79-
You will then convert the retrieved documents in to a string format. Define system and user prompts for the Language Model. This prompt is assembled with the retrieved documents from Milvus. Finally, use the LLM to generate a response based on the prompts.
79+
You will then convert the retrieved documents into a string format. Define system and user prompts for the Language Model. This prompt is assembled with the retrieved documents from Milvus. Finally, use the LLM to generate a response based on the prompts.
8080

8181
Append the code below into `zilliz-llm-rag.py`:
8282

0 commit comments

Comments
 (0)