Skip to content

Commit 6bd39bb

Browse files
Vaibhavs10osansevieropcuenca
authored
Update gguf-llamacpp.md (#1438)
* Update gguf-llamacpp.md * up. * Apply suggestions from code review Co-authored-by: Omar Sanseviero <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]> --------- Co-authored-by: Omar Sanseviero <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]>
1 parent 0383e4e commit 6bd39bb

File tree

1 file changed

+30
-3
lines changed

1 file changed

+30
-3
lines changed

docs/hub/gguf-llamacpp.md

Lines changed: 30 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,36 @@
11
# GGUF usage with llama.cpp
22

3-
Llama.cpp allows you to download and run inference on a GGUF simply by providing a path to the Hugging Face repo path and the file name. llama.cpp download the model checkpoint and automatically caches it. The location of the cache is defined by `LLAMA_CACHE` environment variable, read more about it [here](https://github.com/ggerganov/llama.cpp/pull/7826):
3+
> [!TIP]
4+
> You can now deploy any llama.cpp compatible GGUF on Hugging Face Endpoints, read more about it [here](https://huggingface.co/docs/inference-endpoints/en/others/llamacpp_container)
5+
6+
Llama.cpp allows you to download and run inference on a GGUF simply by providing a path to the Hugging Face repo path and the file name. llama.cpp downloads the model checkpoint and automatically caches it. The location of the cache is defined by `LLAMA_CACHE` environment variable; read more about it [here](https://github.com/ggerganov/llama.cpp/pull/7826).
7+
8+
You can install llama.cpp through brew (works on Mac and Linux), or you can build it from source. There are also pre-built binaries and Docker images that you can [check in the official documentation](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage).
9+
10+
### Option 1: Install with brew
11+
12+
```bash
13+
brew install llama.cpp
14+
```
15+
16+
### Option 2: build from source
17+
18+
Step 1: Clone llama.cpp from GitHub.
19+
20+
```
21+
git clone https://github.com/ggerganov/llama.cpp
22+
```
23+
24+
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
25+
26+
```
27+
cd llama.cpp && LLAMA_CURL=1 make
28+
```
29+
30+
Once installed, you can use the `llama-cli` or `llama-server` as follows:
431

532
```bash
6-
./llama-cli
33+
llama-cli
734
--hf-repo lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF \
835
--hf-file Meta-Llama-3-8B-Instruct-Q8_0.gguf \
936
-p "You are a helpful assistant" -cnv
@@ -14,7 +41,7 @@ Note: You can remove `-cnv` to run the CLI in chat completion mode.
1441
Additionally, you can invoke an OpenAI spec chat completions endpoint directly using the llama.cpp server:
1542

1643
```bash
17-
./llama-server \
44+
llama-server \
1845
--hf-repo lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF \
1946
--hf-file Meta-Llama-3-8B-Instruct-Q8_0.gguf
2047
```

0 commit comments

Comments
 (0)