You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -28,6 +29,30 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)
28
29
29
30
----
30
31
32
+
## Quick start
33
+
34
+
Getting started with llama.cpp is straightforward. Here are several ways to install it on your machine:
35
+
36
+
- Install `llama.cpp` using [brew, nix or winget](docs/install.md)
37
+
- Run with Docker - see our [Docker documentation](docs/docker.md)
38
+
- Download pre-built binaries from the [releases page](https://github.com/ggml-org/llama.cpp/releases)
39
+
- Build from source by cloning this repository - check out [our build guide](docs/build.md)
40
+
41
+
Once installed, you'll need a model to work with. Head to the [Obtaining and quantizing models](#obtaining-and-quantizing-models) section to learn more.
42
+
43
+
Example command:
44
+
45
+
```sh
46
+
# Use a local model file
47
+
llama-cli -m my_model.gguf
48
+
49
+
# Or download and run a model directly from Hugging Face
50
+
llama-cli -hf ggml-org/gemma-3-1b-it-GGUF
51
+
52
+
# Launch OpenAI-compatible API server
53
+
llama-server -hf ggml-org/gemma-3-1b-it-GGUF
54
+
```
55
+
31
56
## Description
32
57
33
58
The main goal of `llama.cpp` is to enable LLM inference with minimal setup and state-of-the-art performance on a wide
@@ -130,6 +155,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
@@ -229,6 +255,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
229
255
230
256
</details>
231
257
258
+
232
259
## Supported backends
233
260
234
261
| Backend | Target devices |
@@ -245,24 +272,18 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
245
272
|[OpenCL](docs/backend/OPENCL.md)| Adreno GPU |
246
273
|[RPC](https://github.com/ggml-org/llama.cpp/tree/master/tools/rpc)| All |
247
274
248
-
## Building the project
249
-
250
-
The main product of this project is the `llama` library. Its C-style interface can be found in [include/llama.h](include/llama.h).
251
-
The project also includes many example programs and tools using the `llama` library. The examples range from simple, minimal code snippets to sophisticated sub-projects such as an OpenAI-compatible HTTP server. Possible methods for obtaining the binaries:
252
-
253
-
- Clone this repository and build locally, see [how to build](docs/build.md)
254
-
- On MacOS or Linux, install `llama.cpp` via [brew, flox or nix](docs/install.md)
255
-
- Use a Docker image, see [documentation for Docker](docs/docker.md)
256
-
- Download pre-built binaries from [releases](https://github.com/ggml-org/llama.cpp/releases)
257
-
258
275
## Obtaining and quantizing models
259
276
260
277
The [Hugging Face](https://huggingface.co) platform hosts a [number of LLMs](https://huggingface.co/models?library=gguf&sort=trending) compatible with `llama.cpp`:
You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, such as [ModelScope](https://modelscope.cn/), by using this CLI argument: `-hf <user>/<model>[:quant]`.
282
+
You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, such as [ModelScope](https://modelscope.cn/), by using this CLI argument: `-hf <user>/<model>[:quant]`. For example:
283
+
284
+
```sh
285
+
llama-cli -hf ggml-org/gemma-3-1b-it-GGUF
286
+
```
266
287
267
288
By default, the CLI would download from Hugging Face, you can switch to other options with the environment variable `MODEL_ENDPOINT`. For example, you may opt to downloading model checkpoints from ModelScope or other model sharing communities by setting the environment variable, e.g. `MODEL_ENDPOINT=https://www.modelscope.cn/`.
0 commit comments