Skip to content

Commit 5fa8432

Browse files
Merge pull request #1823 from geremyCohen/milvus-rag-fix
general updates to zilliz LP
2 parents 39d75f4 + 20024c7 commit 5fa8432

File tree

2 files changed

+6
-5
lines changed

2 files changed

+6
-5
lines changed

content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -39,13 +39,14 @@ Run `make` to build it:
3939

4040
```bash
4141
cd llama.cpp
42-
make GGML_NO_LLAMAFILE=1 -j$(nproc)
42+
cmake -B build
43+
cmake --build build --config Release
4344
```
4445

4546
Check that `llama.cpp` has built correctly by running the help command:
4647

4748
```bash
48-
./llama-cli -h
49+
./build/bin/llama-cli -h
4950
```
5051

5152
If `llama.cpp` has been built correctly, you will see the help option displayed. The output snippet looks like this:
@@ -72,7 +73,7 @@ The GGUF model format, introduced by the Llama.cpp team, uses compression and qu
7273
To requantize the model, run:
7374

7475
```bash
75-
./llama-quantize --allow-requantize dolphin-2.9.4-llama3.1-8b-Q4_0.gguf dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf Q4_0_8_8
76+
./build/bin/llama-quantize --allow-requantize dolphin-2.9.4-llama3.1-8b-Q4_0.gguf dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf Q4_0
7677
```
7778

7879
This outputs a new file, `dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf`, which contains reconfigured weights that allow `llama-cli` to use SVE 256 and MATMUL_INT8 support.
@@ -85,7 +86,7 @@ You can utilize the `llama.cpp` server program and send requests through an Open
8586
Start the server from the command line, and it listens on port 8080:
8687

8788
```bash
88-
./llama-server -m dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf -n 2048 -t 64 -c 65536 --port 8080
89+
./build/bin/llama-server -m dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf -n 2048 -t 64 -c 65536 --port 8080
8990
```
9091

9192
The output from this command should look like:

content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ Now, append the following code to `zilliz-llm-rag.py` and save the contents:
4545

4646
```python
4747
collection_name = "my_rag_collection"
48-
embedding_dim = "384"
48+
embedding_dim = 384
4949

5050
if milvus_client.has_collection(collection_name):
5151
milvus_client.drop_collection(collection_name)

0 commit comments

Comments
 (0)