You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+31-11Lines changed: 31 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,6 +28,30 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)
28
28
29
29
----
30
30
31
+
## Quick start
32
+
33
+
Getting started with llama.cpp is straightforward. Here are several ways to install it on your machine:
34
+
35
+
- Install `llama.cpp` using [brew, nix or winget](docs/install.md)
36
+
- Run with Docker - see our [Docker documentation](docs/docker.md)
37
+
- Download pre-built binaries from the [releases page](https://github.com/ggml-org/llama.cpp/releases)
38
+
- Build from source by cloning this repository - check out [our build guide](docs/build.md)
39
+
40
+
Once installed, you'll need a model to work with. Head to the [Obtaining and quantizing models](#obtaining-and-quantizing-models) section to learn more.
41
+
42
+
Example command:
43
+
44
+
```sh
45
+
# Use a local model file
46
+
llama-cli -m my_model.gguf
47
+
48
+
# Or download and run a model directly from Hugging Face
49
+
llama-cli -hf ggml-org/gemma-3-1b-it-GGUF
50
+
51
+
# Launch OpenAI-compatible API server
52
+
llama-server -hf ggml-org/gemma-3-1b-it-GGUF
53
+
```
54
+
31
55
## Description
32
56
33
57
The main goal of `llama.cpp` is to enable LLM inference with minimal setup and state-of-the-art performance on a wide
@@ -130,6 +154,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
@@ -229,6 +254,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
229
254
230
255
</details>
231
256
257
+
232
258
## Supported backends
233
259
234
260
| Backend | Target devices |
@@ -245,24 +271,18 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
245
271
|[OpenCL](docs/backend/OPENCL.md)| Adreno GPU |
246
272
|[RPC](https://github.com/ggml-org/llama.cpp/tree/master/tools/rpc)| All |
247
273
248
-
## Building the project
249
-
250
-
The main product of this project is the `llama` library. Its C-style interface can be found in [include/llama.h](include/llama.h).
251
-
The project also includes many example programs and tools using the `llama` library. The examples range from simple, minimal code snippets to sophisticated sub-projects such as an OpenAI-compatible HTTP server. Possible methods for obtaining the binaries:
252
-
253
-
- Clone this repository and build locally, see [how to build](docs/build.md)
254
-
- On MacOS or Linux, install `llama.cpp` via [brew, flox or nix](docs/install.md)
255
-
- Use a Docker image, see [documentation for Docker](docs/docker.md)
256
-
- Download pre-built binaries from [releases](https://github.com/ggml-org/llama.cpp/releases)
257
-
258
274
## Obtaining and quantizing models
259
275
260
276
The [Hugging Face](https://huggingface.co) platform hosts a [number of LLMs](https://huggingface.co/models?library=gguf&sort=trending) compatible with `llama.cpp`:
You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, such as [ModelScope](https://modelscope.cn/), by using this CLI argument: `-hf <user>/<model>[:quant]`.
281
+
You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, such as [ModelScope](https://modelscope.cn/), by using this CLI argument: `-hf <user>/<model>[:quant]`. For example:
282
+
283
+
```sh
284
+
llama-cli -hf ggml-org/gemma-3-1b-it-GGUF
285
+
```
266
286
267
287
By default, the CLI would download from Hugging Face, you can switch to other options with the environment variable `MODEL_ENDPOINT`. For example, you may opt to downloading model checkpoints from ModelScope or other model sharing communities by setting the environment variable, e.g. `MODEL_ENDPOINT=https://www.modelscope.cn/`.
0 commit comments