Skip to content

Commit f3c0e5a

Browse files
committed
Added Quickstart section to README
1 parent d5fe4e8 commit f3c0e5a

File tree

1 file changed

+16
-0
lines changed

1 file changed

+16
-0
lines changed

README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,22 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
228228

229229
</details>
230230

231+
## Quickstart
232+
233+
CPU inference:
234+
235+
1. Download package for your OS on [releases page](https://github.com/ggml-org/llama.cpp/releases)
236+
1. Download a GGUF file for your favorite model (for example: https://huggingface.co/bartowski/google_gemma-3-1b-it-qat-GGUF/blob/main/google_gemma-3-1b-it-qat-Q4_0.gguf)
237+
1. Run: `llama-run google_gemma-3-1b-it-qat-Q4_0.gguf`
238+
239+
CUDA on Windows:
240+
241+
1. To run CUDA inference, you need to download both binary and CUDA runtime package, for example:
242+
* llama-b5192-bin-win-cuda-cu12.4-x64.zip
243+
* cudart-llama-bin-win-cu12.4-x64.zip
244+
1. Unpack both into same directory
245+
1. Run with `ngl` flag: `llama-run -ngl 999 google_gemma-3-1b-it-qat-Q4_0.gguf`
246+
231247
## Supported backends
232248

233249
| Backend | Target devices |

0 commit comments

Comments
 (0)