Update gguf-llamacpp.md

Vaibhavs10 · web-flow · commit d440f038a644 · 2024-10-02T18:56:49.000+02:00
diff --git a/docs/hub/gguf-llamacpp.md b/docs/hub/gguf-llamacpp.md
@@ -1,6 +1,30 @@
 # GGUF usage with llama.cpp
 
-Llama.cpp allows you to download and run inference on a GGUF simply by providing a path to the Hugging Face repo path and the file name. llama.cpp download the model checkpoint and automatically caches it. The location of the cache is defined by `LLAMA_CACHE` environment variable, read more about it [here](https://github.com/ggerganov/llama.cpp/pull/7826):
+NEW: You can now deploy any llama.cpp compatible GGUF on Hugging Face Endpoints, read more about it [here](https://huggingface.co/docs/inference-endpoints/en/others/llamacpp_container)
+
+Llama.cpp allows you to download and run inference on a GGUF simply by providing a path to the Hugging Face repo path and the file name. llama.cpp downloads the model checkpoint and automatically caches it. The location of the cache is defined by `LLAMA_CACHE` environment variable, read more about it [here](https://github.com/ggerganov/llama.cpp/pull/7826).
+
+Install llama.cpp through brew (works on Mac and Linux)
+
+```bash
+brew install llama.cpp
+```
+
+You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
+
+Step 1: Clone llama.cpp from GitHub.
+
+```
+git clone https://github.com/ggerganov/llama.cpp
+```
+
+Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
+
+```
+cd llama.cpp && LLAMA_CURL=1 make
+```
+
+Once installed, you can use the `llama-cli` or `llama-server` as follows:
 
 ```bash
 ./llama-cli