Skip to content

Commit 78f71e8

Browse files
committed
up.
1 parent d440f03 commit 78f71e8

File tree

1 file changed

+4
-3
lines changed

1 file changed

+4
-3
lines changed

docs/hub/gguf-llamacpp.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# GGUF usage with llama.cpp
22

3-
NEW: You can now deploy any llama.cpp compatible GGUF on Hugging Face Endpoints, read more about it [here](https://huggingface.co/docs/inference-endpoints/en/others/llamacpp_container)
3+
> [!TIP]
4+
> You can now deploy any llama.cpp compatible GGUF on Hugging Face Endpoints, read more about it [here](https://huggingface.co/docs/inference-endpoints/en/others/llamacpp_container)
45
56
Llama.cpp allows you to download and run inference on a GGUF simply by providing a path to the Hugging Face repo path and the file name. llama.cpp downloads the model checkpoint and automatically caches it. The location of the cache is defined by `LLAMA_CACHE` environment variable, read more about it [here](https://github.com/ggerganov/llama.cpp/pull/7826).
67

@@ -27,7 +28,7 @@ cd llama.cpp && LLAMA_CURL=1 make
2728
Once installed, you can use the `llama-cli` or `llama-server` as follows:
2829

2930
```bash
30-
./llama-cli
31+
llama-cli
3132
--hf-repo lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF \
3233
--hf-file Meta-Llama-3-8B-Instruct-Q8_0.gguf \
3334
-p "You are a helpful assistant" -cnv
@@ -38,7 +39,7 @@ Note: You can remove `-cnv` to run the CLI in chat completion mode.
3839
Additionally, you can invoke an OpenAI spec chat completions endpoint directly using the llama.cpp server:
3940

4041
```bash
41-
./llama-server \
42+
llama-server \
4243
--hf-repo lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF \
4344
--hf-file Meta-Llama-3-8B-Instruct-Q8_0.gguf
4445
```

0 commit comments

Comments
 (0)