upd llama.cpp docs (#1580)

Vaibhavs10 · ngxson · web-flow · commit 3dd77f3dec1a · 2025-01-17T16:16:02.000+01:00
* upd llama.cpp docs

* Update docs/hub/gguf-llamacpp.md

Co-authored-by: Xuan Son Nguyen &lt;thichthat@gmail.com&gt;

* suggestions from code review.

---------

Co-authored-by: Xuan Son Nguyen &lt;thichthat@gmail.com&gt;
diff --git a/docs/hub/gguf-llamacpp.md b/docs/hub/gguf-llamacpp.md
@@ -30,20 +30,15 @@ cd llama.cpp && LLAMA_CURL=1 make
 Once installed, you can use the `llama-cli` or `llama-server` as follows:
 
 ```bash
-llama-cli
-  --hf-repo lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF \
-  --hf-file Meta-Llama-3-8B-Instruct-Q8_0.gguf \
-  -p "You are a helpful assistant" -cnv
+llama-cli -hf bartowski/Llama-3.2-3B-Instruct-GGUF:Q8_0
 ```
 
 Note: You can remove `-cnv` to run the CLI in chat completion mode.
 
 Additionally, you can invoke an OpenAI spec chat completions endpoint directly using the llama.cpp server:
 
 ```bash
-llama-server \
-  --hf-repo lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF \
-  --hf-file Meta-Llama-3-8B-Instruct-Q8_0.gguf
+llama-server -hf bartowski/Llama-3.2-3B-Instruct-GGUF:Q8_0
 ```
 
 After running the server you can simply utilise the endpoint as below:
@@ -66,6 +61,6 @@ curl http://localhost:8080/v1/chat/completions \
 }'
 ```
 
-Replace `--hf-repo` with any valid Hugging Face hub repo name and `--hf-file` with the GGUF file name in the hub repo - off you go! 🦙
+Replace `-hf` with any valid Hugging Face hub repo name - off you go! 🦙
 
 Note: Remember to `build` llama.cpp with `LLAMA_CURL=1` :)