Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 3 additions & 8 deletions docs/hub/gguf-llamacpp.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,20 +30,15 @@ cd llama.cpp && LLAMA_CURL=1 make
Once installed, you can use the `llama-cli` or `llama-server` as follows:

```bash
llama-cli
--hf-repo lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF \
--hf-file Meta-Llama-3-8B-Instruct-Q8_0.gguf \
-p "You are a helpful assistant" -cnv
llama-cli -hf bartowski/Llama-3.2-3B-Instruct-GGUF:Q8_0
```

Note: You can remove `-cnv` to run the CLI in chat completion mode.

Additionally, you can invoke an OpenAI spec chat completions endpoint directly using the llama.cpp server:

```bash
llama-server \
--hf-repo lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF \
--hf-file Meta-Llama-3-8B-Instruct-Q8_0.gguf
llama-server -hf bartowski/Llama-3.2-3B-Instruct-GGUF:Q8_0
```

After running the server you can simply utilise the endpoint as below:
Expand All @@ -66,6 +61,6 @@ curl http://localhost:8080/v1/chat/completions \
}'
```

Replace `--hf-repo` with any valid Hugging Face hub repo name and `--hf-file` with the GGUF file name in the hub repo - off you go! 🦙
Replace `-hf` with any valid Hugging Face hub repo name - off you go! 🦙

Note: Remember to `build` llama.cpp with `LLAMA_CURL=1` :)
Loading