Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
3e7ed29
ci: implement build matrix for CUDA/CPU containers with dynamic tagging
Feb 17, 2026
e536907
fix: Updated Docker images/build-container.yml
Feb 17, 2026
b195492
fix: Updated the documentation about Docker
Feb 17, 2026
d91ca1a
fix: Set Arch for 3090s
Feb 17, 2026
860ff29
Merge branch 'ikawrakow:main' into main
yadirhb Feb 17, 2026
06e6b6a
fix: Updated build step name.
Feb 17, 2026
95bb917
fix: Set target ARCH as a variable
Feb 17, 2026
c891df3
feat: Added cleanup step
Feb 17, 2026
5c9061f
feat: Added docker-bake and updated workflow
Feb 17, 2026
d919c9d
fix: Issue with REPO_OWNER variable
Feb 17, 2026
118fa91
fix: Updated workflow to solve errors
Feb 17, 2026
a403e99
fix: Updated branch format
Feb 17, 2026
be24cc7
fix: Wrong naming
Feb 17, 2026
ce574cb
Merge branch 'ikawrakow:main' into main
yadirhb Feb 18, 2026
90da4f4
Merge branch 'ikawrakow:main' into main
yadirhb Feb 20, 2026
910c8b7
Merge branch 'ikawrakow:main' into main
yadirhb Feb 22, 2026
a4f5e99
Merge branch 'ikawrakow:main' into main
yadirhb Mar 11, 2026
6b969fa
Update docker-bake.hcl
yadirhb Mar 11, 2026
7701845
Update build-container.yml
yadirhb Mar 12, 2026
eff7935
Update ik_llama-cuda.Containerfile
yadirhb Mar 12, 2026
041854a
Update ik_llama-cpu.Containerfile
yadirhb Mar 12, 2026
0aa63f8
Update docker-bake.hcl
yadirhb Mar 12, 2026
4172322
Update build-container.yml
yadirhb Mar 12, 2026
5d8a2ab
Added support for ccache
yadirhb Mar 12, 2026
9503572
Removed action/cache
yadirhb Mar 12, 2026
00ca93c
added -sSL for reliability and fixed the URL path
yadirhb Mar 12, 2026
3acad42
added -sSL for reliability and fixed the URL path CUDA containerfile
yadirhb Mar 12, 2026
b000c48
Merge branch 'ikawrakow:main' into main
yadirhb Mar 12, 2026
a9d7702
Merge branch 'main' into main
yadirhb Mar 27, 2026
ce34ce5
fix: correct Dockerfile RUN command syntax errors
yadirhb Mar 27, 2026
adf2480
fix: correct llama-swap download URL in Containerfiles
yadirhb Mar 27, 2026
eb15276
perf: improve ccache configuration in Containerfiles
yadirhb Mar 27, 2026
f172813
fix: remove problematic ccache initialization from Containerfiles
yadirhb Mar 27, 2026
810e8f6
fix: add git to CPU Containerfile build dependencies
yadirhb Mar 27, 2026
04df4fa
chore: optimize Containerfile with smaller images and better healthch…
yadirhb Mar 27, 2026
fd67fc8
chore: fix CUDA Containerfile healthchecks and swap version
yadirhb Mar 27, 2026
d77d86f
chore: fix indentation in Containerfiles and add LD_LIBRARY_PATH for …
yadirhb Mar 27, 2026
41fd6f1
fix: add --break-system-packages flag for pip in CPU Containerfile
yadirhb Mar 27, 2026
a054b29
Merge branch 'ikawrakow:main' into main
yadirhb Mar 27, 2026
cb1ae40
feat: add git bind mount for build info and NCCL support for CUDA
yadirhb Mar 27, 2026
5434147
fix: remove libnccl-dev from CUDA build (already included in base image)
yadirhb Mar 28, 2026
3bc90df
fix: added Markdown files to ignore files
yadirhb Mar 28, 2026
19a25ae
feat: use BUILD_NUMBER-COMMIT pattern for docker image tags
yadirhb Mar 28, 2026
634d1f0
fix: fetch full git history for accurate BUILD_NUMBER
yadirhb Mar 28, 2026
18f10c6
fix: fetch full git history in Dockerfile for accurate BUILD_NUMBER
yadirhb Mar 28, 2026
1060ba6
chore: update GitHub Actions to latest versions for Node.js 24 compat…
yadirhb Mar 28, 2026
4df6ff0
chore: update all GitHub Actions to Node.js 24 compatible versions
yadirhb Mar 28, 2026
f8abf34
fix: use CI-passed BUILD_NUMBER and LLAMA_COMMIT in Dockerfile
yadirhb Mar 28, 2026
3ee1a14
fix: pass BUILD_NUMBER and LLAMA_COMMIT as Docker build args
yadirhb Mar 28, 2026
b8ad58a
fix: revert docker actions to v4 (latest available versions)
yadirhb Mar 28, 2026
24836c5
fix: calculate BUILD_NUMBER and LLAMA_COMMIT directly in Containerfile
yadirhb Mar 28, 2026
0089016
feat: calculate BUILD_NUMBER and LLAMA_COMMIT in Containerfiles
yadirhb Mar 28, 2026
885d967
feat: calculate BUILD_NUMBER and LLAMA_COMMIT in Containerfiles
yadirhb Mar 28, 2026
3e4e6db
fix: cache improvements for CUDA and CPU builds
yadirhb Mar 28, 2026
47ea4e4
fix: "/.git": not found
yadirhb Mar 28, 2026
4ca3b37
fix: Unnecessary mv llama-swap
yadirhb Mar 28, 2026
73dd7c5
fix: Remove BUILD_NUMBER and LLAMA_COMMIT from docker file, calculate…
yadirhb Mar 28, 2026
985aa8c
fix: remove .git from dockerignore for local and CI builds
yadirhb Mar 29, 2026
c8e75e5
fix: Remove mounts key from Build and Push step in gh workflow
yadirhb Mar 29, 2026
8a2d388
ci: add .git verification step before build
yadirhb Mar 29, 2026
33914bf
refactor: standardize Containerfile structure and remove .git mount d…
yadirhb Mar 29, 2026
eee8ba4
ci: remove broken cache pruning step
yadirhb Mar 29, 2026
11e0a65
ci: remove broken prune-cache job
yadirhb Mar 29, 2026
39442a5
chore: Removed step for Verifying .git existance in GH workflow
yadirhb Mar 29, 2026
0462a5b
Merge branch 'ikawrakow:main' into main
yadirhb Mar 29, 2026
98484c3
fix: ensure build always proceeds even if git switch fails
yadirhb Mar 29, 2026
f589b59
Merge branch 'ikawrakow:main' into main
yadirhb Mar 31, 2026
cc80658
Merge branch 'ikawrakow:main' into main
yadirhb Apr 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
*.o
*.a
*.md
.cache/
.git/
.git
.github/
.gitignore
.vs/
Expand All @@ -18,3 +19,5 @@ models/*
arm_neon.h
compile_commands.json
Dockerfile

**/*.md
103 changes: 103 additions & 0 deletions .github/workflows/build-container.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
name: Build and Push Docker Image

on:
push:
branches:
- main

permissions:
contents: read
packages: write
actions: read

jobs:
build-and-push:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
include:
- variant: "cu12"
cuda_version: "12.6.2"
containerfile: "ik_llama-cuda.Containerfile"
- variant: "cu13"
cuda_version: "13.1.1"
containerfile: "ik_llama-cuda.Containerfile"
- variant: "cpu"
cuda_version: "none"
containerfile: "ik_llama-cpu.Containerfile"

steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v4

- name: Log in to GHCR
uses: docker/login-action@v4
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Prepare Environment
id: prep
run: |
echo "BUILD_NUMBER=$(git rev-list --count HEAD)" >> $GITHUB_ENV
echo "LLAMA_COMMIT=$(git rev-parse --short HEAD)" >> $GITHUB_ENV
echo "REPO_LOWER=$(echo ${{ github.repository_owner }} | tr '[:upper:]' '[:lower:]')" >> $GITHUB_ENV

# 5.1 Restore the cache from GitHub's storage to a host folder
- name: Cache ccache
uses: actions/cache@v4
with:
path: .buildkit-cache
key: ccache-${{ matrix.variant }}-${{ github.run_id }}
restore-keys: |
ccache-${{ matrix.variant }}-

# 5.2. "Inject" that host folder into BuildKit's internal mount system
- name: Inject ccache into BuildKit
uses: reproducible-containers/buildkit-cache-dance@v3
with:
cache-map: |
{
".buildkit-cache": "/ccache"
}
skip-extraction: ${{ github.event_name == 'pull_request' }}

# 5.3 Build and push using the cache
- name: Build and Push
uses: docker/bake-action@v7
env:
REPO_OWNER: ${{ env.REPO_LOWER }}
VARIANT: ${{ matrix.variant }}
BUILD_NUMBER: ${{ env.BUILD_NUMBER }}
LLAMA_COMMIT: ${{ env.LLAMA_COMMIT }}
CUDA_VERSION: ${{ matrix.cuda_version }}
GGML_NATIVE: "OFF" # Force OFF for CI portability
USE_CCACHE: "true"
with:
push: true
files: ./docker-bake.hcl
set: |
*.dockerfile=./docker/${{ matrix.containerfile }}
*.cache-from=type=gha,scope=ccache-${{ matrix.variant }}
*.cache-to=type=gha,mode=max,scope=ccache-${{ matrix.variant }}

cleanup:
runs-on: ubuntu-latest
needs: build-and-push
if: success()
steps:
- name: Delete untagged images
uses: vlaurin/action-ghcr-prune@v0.6.0
with:
token: ${{ secrets.GITHUB_TOKEN }}
organization: ${{ github.repository_owner }}
container: ik-llama-cpp
keep-younger-than: 0
untagged-only: true
59 changes: 59 additions & 0 deletions docker-bake.hcl
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
variable "REPO_OWNER" {}
variable "VARIANT" {}
variable "SHA_SHORT" {}
variable "BUILD_NUMBER" {}
variable "LLAMA_COMMIT" {}
variable "CUDA_VERSION" {}
variable "CUDA_DOCKER_ARCH" { default = "86;90" }
variable "USE_CCACHE" { default = "true" }
variable "GGML_NATIVE" { default = "ON" }

# Common cache configuration for GitHub Actions
target "cache_settings" {
cache-from = ["type=gha,scope=ccache-${VARIANT}"]
cache-to = ["type=gha,mode=max,scope=ccache-${VARIANT}"]
}

group "default" {
targets = ["server", "full", "swap"]
}

target "settings" {
context = "."
inherits = ["cache_settings"]
args = {
BUILD_NUMBER = "${BUILD_NUMBER}"
LLAMA_COMMIT = "${LLAMA_COMMIT}"
CUDA_VERSION = "${CUDA_VERSION}"
CUDA_DOCKER_ARCH = "${CUDA_DOCKER_ARCH}"
GGML_NATIVE = "${GGML_NATIVE}"
USE_CCACHE = "${USE_CCACHE}"
}
}

target "server" {
inherits = ["settings"]
target = "server"
tags = [
"ghcr.io/${REPO_OWNER}/ik-llama-cpp:${VARIANT}-server-${BUILD_NUMBER}",
"ghcr.io/${REPO_OWNER}/ik-llama-cpp:${VARIANT}-server"
]
}

target "full" {
inherits = ["settings"]
target = "full"
tags = [
"ghcr.io/${REPO_OWNER}/ik-llama-cpp:${VARIANT}-full-${BUILD_NUMBER}-${LLAMA_COMMIT}",
"ghcr.io/${REPO_OWNER}/ik-llama-cpp:${VARIANT}-full"
]
}

target "swap" {
inherits = ["settings"]
target = "swap"
tags = [
"ghcr.io/${REPO_OWNER}/ik-llama-cpp:${VARIANT}-swap-${BUILD_NUMBER}-${LLAMA_COMMIT}",
"ghcr.io/${REPO_OWNER}/ik-llama-cpp:${VARIANT}-swap"
]
}
39 changes: 17 additions & 22 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,51 +14,46 @@ CPU or CUDA sections under [Build](#Build) and [Run]($Run) are enough to get up
- [Extra Features](#Extra)
- [Credits](#Credits)

# Build
## Build

Builds two image tags:

- `swap`: Includes only `llama-swap` and `llama-server`.
- `full`: Includes `llama-server`, `llama-quantize`, and other utilities.

Start: download the 4 files to a new directory (e.g. `~/ik_llama/`) then follow the next steps.
### Start:

```
└── ik_llama
├── ik_llama-cpu.Containerfile
├── ik_llama-cpu-swap.config.yaml
├── ik_llama-cuda.Containerfile
└── ik_llama-cuda-swap.config.yaml
```
1. Clone the repository as `git clone https://github.com/ikawrakow/ik_llama.cpp`
2. Enter the repo: `cd ik_llama.cpp`, then follow the next steps.

## CPU
### CPU

```
podman image build --format Dockerfile --file ik_llama-cpu.Containerfile --target full --tag ik_llama-cpu:full && podman image build --format Dockerfile --file ik_llama-cpu.Containerfile --target swap --tag ik_llama-cpu:swap
podman image build --format Dockerfile --file ./docker/ik_llama-cpu.Containerfile --target full --tag ik_llama-cpu:full . && podman image build --format Dockerfile --file ./docker/ik_llama-cpu.Containerfile --target swap --tag ik_llama-cpu:swap .
```

```
docker image build --file ik_llama-cpu.Containerfile --target full --tag ik_llama-cpu:full . && docker image build --file ik_llama-cpu.Containerfile --target swap --tag ik_llama-cpu:swap .
docker image build --file ./docker/ik_llama-cpu.Containerfile --target full --tag ik_llama-cpu:full . && docker image build --file ./docker/ik_llama-cpu.Containerfile --target swap --tag ik_llama-cpu:swap .
```

## CUDA
### CUDA

```
podman image build --format Dockerfile --file ik_llama-cuda.Containerfile --target full --tag ik_llama-cuda:full && podman image build --format Dockerfile --file ik_llama-cuda.Containerfile --target swap --tag ik_llama-cuda:swap
podman image build --format Dockerfile --file ./docker/ik_llama-cuda.Containerfile --target full --tag ik_llama-cuda:full . && podman image build --format Dockerfile --file ./docker/ik_llama-cuda.Containerfile --target swap --tag ik_llama-cuda:swap .
```

```
docker image build --file ik_llama-cuda.Containerfile --target full --tag ik_llama-cuda:full . && docker image build --file ik_llama-cuda.Containerfile --target swap --tag ik_llama-cuda:swap .
docker image build --file ./docker/ik_llama-cuda.Containerfile --target full --tag ik_llama-cuda:full . && docker image build --file ./docker/ik_llama-cuda.Containerfile --target swap --tag ik_llama-cuda:swap .
```

# Run
## Run

- Download `.gguf` model files to your favorite directory (e.g. `/my_local_files/gguf`).
- Map it to `/models` inside the container.
- Open browser `http://localhost:9292` and enjoy the features.
- API endpoints are available at `http://localhost:9292/v1` for use in other applications.

## CPU
### CPU

```
podman run -it --name ik_llama --rm -p 9292:8080 -v /my_local_files/gguf:/models:ro localhost/ik_llama-cpu:swap
Expand All @@ -68,7 +63,7 @@ podman run -it --name ik_llama --rm -p 9292:8080 -v /my_local_files/gguf:/models
docker run -it --name ik_llama --rm -p 9292:8080 -v /my_local_files/gguf:/models:ro ik_llama-cpu:swap
```

## CUDA
### CUDA

- Install Nvidia Drivers and CUDA on the host.
- For Docker, install [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
Expand All @@ -85,13 +80,13 @@ podman run -it --name ik_llama --rm -p 9292:8080 -v /my_local_files/gguf:/models
docker run -it --name ik_llama --rm -p 9292:8080 -v /my_local_files/gguf:/models:ro --runtime nvidia ik_llama-cuda:swap
```

# Troubleshooting
## Troubleshooting

- If CUDA is not available, use `ik_llama-cpu` instead.
- If models are not found, ensure you mount the correct directory: `-v /my_local_files/gguf:/models:ro`
- If you need to install `podman` or `docker` follow the [Podman Installation](https://podman.io/docs/installation) or [Install Docker Engine](https://docs.docker.com/engine/install) for your OS.

# Extra
## Extra

- `CUSTOM_COMMIT` can be used to build a specific `ik_llama.cpp` commit (e.g. `1ec12b8`).

Expand Down Expand Up @@ -121,7 +116,7 @@ docker run -it --name ik_llama_full --rm -v /my_local_files/gguf:/models:ro --r
# ./llama-sweep-bench ...
```

- Customize `llama-swap` config: save the `ik_llama-cpu-swap.config.yaml` or `ik_llama-cuda-swap.config.yaml` localy (e.g. under `/my_local_files/`) then map it to `/app/config.yaml` inside the container appending `-v /my_local_files/ik_llama-cpu-swap.config.yaml:/app/config.yaml:ro` to your`podman run ...` or `docker run ...`.
- Customize `llama-swap` config: save the `./docker/ik_llama-cpu-swap.config.yaml` or `./docker/ik_llama-cuda-swap.config.yaml` localy (e.g. under `/my_local_files/`) then map it to `/app/config.yaml` inside the container appending `-v /my_local_files/ik_llama-cpu-swap.config.yaml:/app/config.yaml:ro` to your`podman run ...` or `docker run ...`.
- To run the container in background, replace `-it` with `-d`: `podman run -d ...` or `docker run -d ...`. To stop it: `podman stop ik_llama` or `docker stop ik_llama`.
- If you build the image on a diferent machine, change `-DGGML_NATIVE=ON` to `-DGGML_NATIVE=OFF` in the `.Containerfile`.
- If you build only for your GPU architecture and want to make use of more KV quantization types, build with `-DGGML_IQK_FA_ALL_QUANTS=ON`.
Expand All @@ -133,7 +128,7 @@ docker run -it --name ik_llama_full --rm -v /my_local_files/gguf:/models:ro --r
- Download from [ik_llama.cpp's Thireus fork with release builds for macOS/Windows/Ubuntu CPU and Windows CUDA](https://github.com/Thireus/ik_llama.cpp) if you cannot build.
- For a KoboldCPP experience [Croco.Cpp is fork of KoboldCPP infering GGML/GGUF models on CPU/Cuda with KoboldAI's UI. It's powered partly by IK_LLama.cpp, and compatible with most of Ikawrakow's quants except Bitnet. ](https://github.com/Nexesenex/croco.cpp)

# Credits
## Credits

All credits to the awesome community:

Expand Down
Loading