Editorial

madeline-underwood · madeline-underwood · commit fccb52239405 · 2024-10-18T15:51:01.000Z
diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md
@@ -1,5 +1,5 @@
 ---
-title: Launch LLM Server 
+title: Launch the LLM Server 
 weight: 4
 
 ### FIXED, DO NOT MODIFY
@@ -67,20 +67,20 @@ huggingface-cli download cognitivecomputations/dolphin-2.9.4-llama3.1-8b-gguf do
 The GGUF model format, introduced by the Llama.cpp team, uses compression and quantization to reduce weight precision to 4-bit integers, significantly decreasing computational and memory demands and making Arm CPUs effective for LLM inference.
 
 
-### Re-quantize the model weights
+### Requantize the model weights
 
-To re-quantize the model, run:
+To requantize the model, run:
 
 ```bash
 ./llama-quantize --allow-requantize dolphin-2.9.4-llama3.1-8b-Q4_0.gguf dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf Q4_0_8_8
 ```
 
-This will output a new file, `dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf`, which contains reconfigured weights that allow `llama-cli` to use SVE 256 and MATMUL_INT8 support.
+This outputs a new file, `dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf`, which contains reconfigured weights that allow `llama-cli` to use SVE 256 and MATMUL_INT8 support.
 
 This requantization is optimal specifically for Graviton3. For Graviton2, the optimal requantization should be performed in the `Q4_0_4_4` format, and for Graviton4, the `Q4_0_4_8` format is the most suitable for requantization.
 
 ### Start the LLM Server
-You can utilize the `llama.cpp` server program and send requests via an OpenAI-compatible API. This allows you to develop applications that interact with the LLM multiple times without having to repeatedly start and stop it. Additionally, you can access the server from another machine where the LLM is hosted over the network.
+You can utilize the `llama.cpp` server program and send requests through an OpenAI-compatible API. This allows you to develop applications that interact with the LLM multiple times without having to repeatedly start and stop it. Additionally, you can access the server from another machine where the LLM is hosted over the network.
 
 Start the server from the command line, and it listens on port 8080:
 
diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md
@@ -60,7 +60,7 @@ milvus_client.create_collection(
 This code checks if a collection already exists and drops it if it does. If this happens, you can create a new collection with the specified parameters.
 
 If you do not specify any field information, Milvus automatically creates a default `id` field for the primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema defined fields and their values.
-You can use inner product distance as the default metric type. For more information about distance types, you can refer to [Similarity Metrics page](https://milvus.io/docs/metric.md?tab=floating)
+You can use inner product distance as the default metric type. For more information about distance types, you can refer to [Similarity Metrics page](https://milvus.io/docs/metric.md?tab=floating).
 
 You can now prepare the data to use in this collection.
 
@@ -75,7 +75,7 @@ wget https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/m
 unzip -q milvus_docs_2.4.x_en.zip -d milvus_docs
 ```
 
-Now load all the markdown files from the folder `milvus_docs/en/faq` into your data collection. For each document, use "# " to separate the content in the file, which can separate the content of each main part of the markdown file.
+Now load all the markdown files from the folder `milvus_docs/en/faq` into your data collection. For each document, use "# " to separate the content in the file. This divides the content of each main part of the markdown file.
 
 Open `zilliz-llm-rag.py` and append the following code to it:
 
diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md
@@ -76,7 +76,7 @@ You are now ready to use the LLM and obtain a RAG response.
 
 For the LLM, you will use the OpenAI SDK to request the Llama service you launched in the previous section. You do not need to use an API key because it is running locally on your machine. 
 
-You will then convert the retrieved documents in to a string format. Define system and user prompts for the Language Model. This prompt is assembled with the retrieved documents from Milvus. Finally, use the LLM to generate a response based on the prompts.
+You will then convert the retrieved documents into a string format. Define system and user prompts for the Language Model. This prompt is assembled with the retrieved documents from Milvus. Finally, use the LLM to generate a response based on the prompts.
 
 Append the code below into `zilliz-llm-rag.py`: