Merge pull request #1823 from geremyCohen/milvus-rag-fix

jasonrandrews · web-flow · commit 5fa84327c2a7 · 2025-04-12T07:59:41.000-05:00
general updates to zilliz LP
diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md
@@ -39,13 +39,14 @@ Run `make` to build it:
 
 ```bash
 cd llama.cpp
-make GGML_NO_LLAMAFILE=1 -j$(nproc)
+cmake -B build
+cmake --build build --config Release
 ```
 
 Check that `llama.cpp` has built correctly by running the help command:
 
 ```bash
-./llama-cli -h
+./build/bin/llama-cli -h
 ```
 
 If `llama.cpp` has been built correctly, you will see the help option displayed. The output snippet looks like this:
@@ -72,7 +73,7 @@ The GGUF model format, introduced by the Llama.cpp team, uses compression and qu
 To requantize the model, run:
 
 ```bash
-./llama-quantize --allow-requantize dolphin-2.9.4-llama3.1-8b-Q4_0.gguf dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf Q4_0_8_8
+./build/bin/llama-quantize --allow-requantize dolphin-2.9.4-llama3.1-8b-Q4_0.gguf dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf Q4_0
 ```
 
 This outputs a new file, `dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf`, which contains reconfigured weights that allow `llama-cli` to use SVE 256 and MATMUL_INT8 support.
@@ -85,7 +86,7 @@ You can utilize the `llama.cpp` server program and send requests through an Open
 Start the server from the command line, and it listens on port 8080:
 
 ```bash
-./llama-server -m dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf -n 2048 -t 64 -c 65536  --port 8080
+./build/bin/llama-server -m dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf -n 2048 -t 64 -c 65536  --port 8080
 ```
 
 The output from this command should look like:
diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md
@@ -45,7 +45,7 @@ Now, append the following code to `zilliz-llm-rag.py` and save the contents:
 
 ```python
 collection_name = "my_rag_collection"
-embedding_dim = "384"
+embedding_dim = 384
 
 if milvus_client.has_collection(collection_name):
     milvus_client.drop_collection(collection_name)