You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+30-11Lines changed: 30 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,6 +28,30 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)
28
28
29
29
----
30
30
31
+
## Quick start
32
+
33
+
Getting started with llama.cpp is straightforward. Here are several ways to install it on your machine:
34
+
35
+
-**⭐ Recommended**: Install `llama.cpp` using [brew, flox, nix or winget](docs/install.md)
36
+
- Run with Docker - see our [Docker documentation](docs/docker.md)
37
+
- Download pre-built binaries from the [releases page](https://github.com/ggml-org/llama.cpp/releases)
38
+
- Build from source by cloning this repository - check out [our build guide](docs/build.md)
39
+
40
+
Once installed, you'll need a model to work with. Head to the [Obtaining and quantizing models](#obtaining-and-quantizing-models) section to learn more.
41
+
42
+
Example command:
43
+
44
+
```sh
45
+
# Use a local model file
46
+
llama-cli -m my_model.gguf
47
+
48
+
# Or download and run a model directly from Hugging Face
49
+
llama-cli -hf ggml-org/gemma-3-1b-it-GGUF
50
+
51
+
# Launch OpenAI-compatible API server
52
+
llama-server -hf ggml-org/gemma-3-1b-it-GGUF
53
+
```
54
+
31
55
## Description
32
56
33
57
The main goal of `llama.cpp` is to enable LLM inference with minimal setup and state-of-the-art performance on a wide
@@ -229,6 +253,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
229
253
230
254
</details>
231
255
256
+
232
257
## Supported backends
233
258
234
259
| Backend | Target devices |
@@ -245,24 +270,18 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
245
270
|[OpenCL](docs/backend/OPENCL.md)| Adreno GPU |
246
271
|[RPC](https://github.com/ggml-org/llama.cpp/tree/master/tools/rpc)| All |
247
272
248
-
## Building the project
249
-
250
-
The main product of this project is the `llama` library. Its C-style interface can be found in [include/llama.h](include/llama.h).
251
-
The project also includes many example programs and tools using the `llama` library. The examples range from simple, minimal code snippets to sophisticated sub-projects such as an OpenAI-compatible HTTP server. Possible methods for obtaining the binaries:
252
-
253
-
- Clone this repository and build locally, see [how to build](docs/build.md)
254
-
- On MacOS or Linux, install `llama.cpp` via [brew, flox or nix](docs/install.md)
255
-
- Use a Docker image, see [documentation for Docker](docs/docker.md)
256
-
- Download pre-built binaries from [releases](https://github.com/ggml-org/llama.cpp/releases)
257
-
258
273
## Obtaining and quantizing models
259
274
260
275
The [Hugging Face](https://huggingface.co) platform hosts a [number of LLMs](https://huggingface.co/models?library=gguf&sort=trending) compatible with `llama.cpp`:
You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, such as [ModelScope](https://modelscope.cn/), by using this CLI argument: `-hf <user>/<model>[:quant]`.
280
+
You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, such as [ModelScope](https://modelscope.cn/), by using this CLI argument: `-hf <user>/<model>[:quant]`. For example:
281
+
282
+
```sh
283
+
llama-cli -hf ggml-org/gemma-3-1b-it-GGUF
284
+
```
266
285
267
286
By default, the CLI would download from Hugging Face, you can switch to other options with the environment variable `MODEL_ENDPOINT`. For example, you may opt to downloading model checkpoints from ModelScope or other model sharing communities by setting the environment variable, e.g. `MODEL_ENDPOINT=https://www.modelscope.cn/`.
Copy file name to clipboardExpand all lines: docs/build.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,9 @@
1
1
# Build llama.cpp locally
2
2
3
+
The main product of this project is the `llama` library. Its C-style interface can be found in [include/llama.h](include/llama.h).
4
+
5
+
The project also includes many example programs and tools using the `llama` library. The examples range from simple, minimal code snippets to sophisticated sub-projects such as an OpenAI-compatible HTTP server.
Copy file name to clipboardExpand all lines: docs/install.md
+23-8Lines changed: 23 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,28 +1,43 @@
1
1
# Install pre-built version of llama.cpp
2
2
3
-
## Homebrew
3
+
| Install via | Windows | Mac | Linux |
4
+
|-------------|---------|-----|-------|
5
+
| Winget | ✅ |||
6
+
| Homebrew || ✅ | ✅ |
7
+
| MacPorts || ✅ ||
8
+
| Nix || ✅ | ✅ |
9
+
| Flox || ✅ | ✅ |
4
10
5
-
On Mac and Linux, the homebrew package manager can be used via
11
+
## Winget (Windows)
12
+
13
+
```sh
14
+
winget install llama.cpp
15
+
```
16
+
17
+
The package is automatically updated with new `llama.cpp` releases. More info: https://github.com/ggml-org/llama.cpp/issues/8188
18
+
19
+
## Homebrew (Mac and Linux)
6
20
7
21
```sh
8
22
brew install llama.cpp
9
23
```
24
+
10
25
The formula is automatically updated with new `llama.cpp` releases. More info: https://github.com/ggml-org/llama.cpp/discussions/7668
11
26
12
-
## MacPorts
27
+
## MacPorts (Mac)
13
28
14
29
```sh
15
30
sudo port install llama.cpp
16
31
```
17
-
see also: https://ports.macports.org/port/llama.cpp/details/
18
32
19
-
## Nix
33
+
See also: https://ports.macports.org/port/llama.cpp/details/
20
34
21
-
On Mac and Linux, the Nix package manager can be used via
35
+
## Nix (Mac and Linux)
22
36
23
37
```sh
24
38
nix profile install nixpkgs#llama-cpp
25
39
```
40
+
26
41
For flake enabled installs.
27
42
28
43
Or
@@ -35,9 +50,9 @@ For non-flake enabled installs.
35
50
36
51
This expression is automatically updated within the [nixpkgs repo](https://github.com/NixOS/nixpkgs/blob/nixos-24.05/pkgs/by-name/ll/llama-cpp/package.nix#L164).
37
52
38
-
## Flox
53
+
## Flox (Mac and Linux)
39
54
40
-
On Mac and Linux, Flox can be used to install llama.cpp within a Flox environment via
55
+
Flox can be used to install llama.cpp within a Flox environment via
0 commit comments