You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/hub/gguf-llamacpp.md
+30-3Lines changed: 30 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,36 @@
1
1
# GGUF usage with llama.cpp
2
2
3
-
Llama.cpp allows you to download and run inference on a GGUF simply by providing a path to the Hugging Face repo path and the file name. llama.cpp download the model checkpoint and automatically caches it. The location of the cache is defined by `LLAMA_CACHE` environment variable, read more about it [here](https://github.com/ggerganov/llama.cpp/pull/7826):
3
+
> [!TIP]
4
+
> You can now deploy any llama.cpp compatible GGUF on Hugging Face Endpoints, read more about it [here](https://huggingface.co/docs/inference-endpoints/en/others/llamacpp_container)
5
+
6
+
Llama.cpp allows you to download and run inference on a GGUF simply by providing a path to the Hugging Face repo path and the file name. llama.cpp downloads the model checkpoint and automatically caches it. The location of the cache is defined by `LLAMA_CACHE` environment variable; read more about it [here](https://github.com/ggerganov/llama.cpp/pull/7826).
7
+
8
+
You can install llama.cpp through brew (works on Mac and Linux), or you can build it from source. There are also pre-built binaries and Docker images that you can [check in the official documentation](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage).
9
+
10
+
### Option 1: Install with brew
11
+
12
+
```bash
13
+
brew install llama.cpp
14
+
```
15
+
16
+
### Option 2: build from source
17
+
18
+
Step 1: Clone llama.cpp from GitHub.
19
+
20
+
```
21
+
git clone https://github.com/ggerganov/llama.cpp
22
+
```
23
+
24
+
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
25
+
26
+
```
27
+
cd llama.cpp && LLAMA_CURL=1 make
28
+
```
29
+
30
+
Once installed, you can use the `llama-cli` or `llama-server` as follows:
0 commit comments