diff --git a/docs/hub/gguf-llamacpp.md b/docs/hub/gguf-llamacpp.md index 17e9aff81..de2e73e0f 100644 --- a/docs/hub/gguf-llamacpp.md +++ b/docs/hub/gguf-llamacpp.md @@ -30,10 +30,7 @@ cd llama.cpp && LLAMA_CURL=1 make Once installed, you can use the `llama-cli` or `llama-server` as follows: ```bash -llama-cli - --hf-repo lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF \ - --hf-file Meta-Llama-3-8B-Instruct-Q8_0.gguf \ - -p "You are a helpful assistant" -cnv +llama-cli -hf bartowski/Llama-3.2-3B-Instruct-GGUF:Q8_0 ``` Note: You can remove `-cnv` to run the CLI in chat completion mode. @@ -41,9 +38,7 @@ Note: You can remove `-cnv` to run the CLI in chat completion mode. Additionally, you can invoke an OpenAI spec chat completions endpoint directly using the llama.cpp server: ```bash -llama-server \ - --hf-repo lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF \ - --hf-file Meta-Llama-3-8B-Instruct-Q8_0.gguf +llama-server -hf bartowski/Llama-3.2-3B-Instruct-GGUF:Q8_0 ``` After running the server you can simply utilise the endpoint as below: @@ -66,6 +61,6 @@ curl http://localhost:8080/v1/chat/completions \ }' ``` -Replace `--hf-repo` with any valid Hugging Face hub repo name and `--hf-file` with the GGUF file name in the hub repo - off you go! 🦙 +Replace `-hf` with any valid Hugging Face hub repo name - off you go! 🦙 Note: Remember to `build` llama.cpp with `LLAMA_CURL=1` :)