Added workflow to build container images by yadirhb · Pull Request #1279 · ikawrakow/ik_llama.cpp

yadirhb · 2026-02-17T20:30:19Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Acknowledgement

First of all, this project is amazing and I think more people should be able to use it! I've been trying this project for a while, in different occasions honestly at par with main llama.cpp and others. I have a local k3s cluster with 3x3090s and in the last few weeks I finally got llama.cpp (main) deployed in Kubernetes and running a few models, such as GLM-4.7-Flash, Qwen3-Coder-Next and Kimi-Linear. Now I am working to deploy ik_llama.cpp to run models for agentic use cases.

Before Kubernetes I was always using LXCs in Proxmox and I built some scripts to automatically pull from github latest, build and update the binaries on schedule at 3:00 am.

I know this project excels in efficiency and optimization but it's a pain to keep locally building this one and having main llama.cpp on a container already.

Some facts:

Llama-cpp has the smallest image size for all alternatives I ran in my cluster (around 4.7 Gb). Sglang, vllm and tabbyAPI all have huge image size (around 13 Gb).
Except for tabbyAPI - exl3, the other alternatives are very slow to load. Llama-cpp has the best times.
The combo llama-cpp + llama-swap is good enough for tasks such as agentic coding and so on, but the problem is high throughput ... if only that could be figured out, then this could be very much a real contender to anything production ready out there... (me dreaming)

Where I'm going with this is, this project is amazing and I want more people to be able to use it. So, this PR brings a github workflow to build and store container images in a package, publicly accessible ghcr.io/ikawrakow/ik-llama-cpp:${TAGS}, where possible tags (atm) are:

ghcr.io/ikawrakow/ik-llama-cpp:cpu-full
ghcr.io/ikawrakow/ik-llama-cpp:cpu-swap
ghcr.io/ikawrakow/ik-llama-cpp:cu12-full - Uses CUDA version 12.6.2 the same from ./docker/*.Containerfile
ghcr.io/ikawrakow/ik-llama-cpp:cu13-full - I added in the workflow two variantes since I have my CUDA version already up to 13.x
ghcr.io/ikawrakow/ik-llama-cpp:cu12-swap - Only includes llama-server and llama-swap
ghcr.io/ikawrakow/ik-llama-cpp:cu13-swap - Same as above but CUDA 13

Github Container Registry is free up to certain level, so why not use it?!!!

Target CUDA ARCH

I set the workflow to build for CUDA_DOCKER_ARCH=86;90 as discussed here , but this can be updated as needed.

Overview

Hopefully we can go from building latest ik_llama.cpp to have Github Actions building for us and just pull the container images.

ikawrakow · 2026-02-18T06:57:26Z

I know nothing about containers. If somebody more knowledgable is paying attention, can you please help review? Thank you.

mcm007 · 2026-02-19T18:14:21Z

For me (without actual experience with GitHub workflow) looks generally OK, only few suggestions:

Maybe change the trigger to run once per day instead of running after each push.
The build process of this project is long and resource intensive.
actions/checkout@v4 is not strictly required (by default, this action uses the Git context, so you don't need to use the actions/checkout action to check out the repository as this will be done directly by BuildKit.)
Not sure if there is a more trustworthy cleanup method, instead of vlaurin/action-ghcr-prune which has the last commit 2 years ago.

Note: Due to "At the start of each workflow job, GitHub automatically creates a unique GITHUB_TOKEN secret to use in your workflow. You can use the GITHUB_TOKEN to authenticate in the workflow job [...]" some extra care will be needed in the future as well to keep it safe, like:

Limit/avoid usage of 3rd party actions
"Use the permissions key in your workflow file to modify permissions for the GITHUB_TOKEN" for entire workflow like this PR is already doing:

permissions:
  contents: read
  packages: write
  actions: read

yadirhb · 2026-02-20T21:34:25Z

For me (without actual experience with GitHub workflow) looks generally OK, only few suggestions:

Maybe change the trigger to run once per day instead of running after each push.
The build process of this project is long and resource intensive.

actions/checkout@v4 is not strictly required (by default, this action uses the Git context, so you don't need to use the actions/checkout action to check out the repository as this will be done directly by BuildKit.)

Not sure if there is a more trustworthy cleanup method, instead of vlaurin/action-ghcr-prune which has the last commit 2 years ago.

Note: Due to "At the start of each workflow job, GitHub automatically creates a unique GITHUB_TOKEN secret to use in your workflow. You can use the GITHUB_TOKEN to authenticate in the workflow job [...]" some extra care will be needed in the future as well to keep it safe, like:

Limit/avoid usage of 3rd party actions

"Use the permissions key in your workflow file to modify permissions for the GITHUB_TOKEN" for entire workflow like this PR is already doing:
permissions:
  contents: read
  packages: write
  actions: read

I have some questions on these suggestions:

How frequent are changes pushed to the main branch? (as-is right now this workflow builds in ~ 1.5 hr so we can build multiple times a day. Furthermore, the current workflows have concurrency configuration set so it cancel previous jobs if when triggered there is another job running. This last part is less ideal if you really want to have a build per change, otherwise you ensure to build always latest. (8 times a day seems a lot)
Nice on this one, I'll check and update as needed.
This workflow already uses GITHUB_TOKEN for authentication and relies on permissions, but thanks for sharing a more granular model for permissions: per job instead of the whole workflow when using third-party actions.

As for thirdparty actions, I agree except this one is good for the job is doing. It's to remove untagged container images. On every build we update tags such as: cpu-full, cpu-swap, cu12-full, cu12swa... so I just run a cleanup to remove untagged images. Still for each image we create a commit-id based tag suffix.

I'll take all these notes and update the PR. Thank you for your feedback!

yadirhb · 2026-02-20T21:37:01Z

In the meantime, docker images are available here: https://github.com/sourceupcode/ik_llama.cpp/pkgs/container/ik-llama-cpp

chulucninh09 · 2026-02-22T17:19:58Z

docker/ik_llama-cpu.Containerfile


-RUN apt-get update && apt-get install -yq build-essential git libcurl4-openssl-dev curl libgomp1 cmake
-RUN git clone https://github.com/ikawrakow/ik_llama.cpp.git /app
+RUN apt-get update && apt-get install -yq build-essential libcurl4-openssl-dev curl libgomp1 cmake


we can use ccache and docker cache here to speedup the build?

As it is now, with stages in the Containerfile and chained with &&, it builds only once then is reused on the following stages.

ccache, per my understanding, is effective when building again on the same machine, which is not the case with GitHub Workflow, each time on a different runner/machine.
Maybe when building manually on the local machine it worth trying by mapping a persistent volume.

Yes, the nature of serverless CI/CD is each time it run in different machine. However, since we are using Docker, we can use docker buildx driver to do inline cache or registry cache. We just need to cache ccache folder for reuse next time.

Since the commits are incremental, this will speed up the build pretty well.

Check out the docs here
https://docs.docker.com/build/cache/backends/registry/

There are a few pre-made ccache actions availbale https://github.com/marketplace?query=ccache&type=actions

- Add git fetch --unshallow to get complete commit history during build - This ensures build-info.cpp is generated with correct LLAMA_BUILD_NUMBER

…ibility - docker/setup-buildx-action@v3 -> v4 - docker/login-action@v3 -> v4

- actions/checkout@v4 -> v6 - docker/setup-buildx-action@v3 -> v5 - docker/login-action@v3 -> v6 - docker/bake-action@v5 -> v7

- Add BUILD_NUMBER and LLAMA_COMMIT as build args - Fall back to git commands if not provided - Pass values explicitly to cmake for accurate build info

- Add BUILD_NUMBER and LLAMA_COMMIT to docker bake args - These will be used by the Containerfile for accurate build info

- Removed ARG defaults since we calculate from git during build - Use git rev-list --count HEAD and git rev-parse for accurate version info - Falls back to 0/unknown if git commands fail

- Add git-based version calculation in both CPU and CUDA Containerfiles - Remove .git bind mount (git is copied with COPY .) - Pass build info to CMake for accurate llama-server --version output

- Add git-based version calculation using git rev-list and git rev-parse - Copy .git directory separately to ensure git commands work during build - Pass build info to CMake for accurate llama-server --version output

…d by cmake proc

- Enables cmake to access .git directory during Docker build - Required for version calculation in llama-server binary - GitHub Actions uses explicit mount via bake action set parameter

…ependency - Remove --mount=type=bind,source=.git,target=.git from both Containerfiles - Replace COPY . . with git clone for cleaner build context - Add CUSTOM_COMMIT ARG for optional custom commit switching - Standardize ARG/ENV ordering and comment formatting across CPU/CUDA variants - Install ca-certificates before git clone to fix SSL verification issues - Rename 'Structured artifact collection' to 'Collect build artifacts'

- Remove prune-cache job that was failing due to missing .git directory - The job required a checkout step and the cache pruning logic was non-critical

- Add '|| true' to git switch command so build continues on failure - This prevents the entire RUN step from failing when CUSTOM_COMMIT is invalid

ikawrakow · 2026-04-01T07:32:34Z

@yadirhb I see you were still very actively committing to this PR until 2 days ago. Is it now ready to be merged?

mcm007 · 2026-04-01T15:13:18Z

I think we better remove llama-swap

621

[mostlygeek](https://github.com/mostlygeek)
opened [6 hours ago](https://github.com/mostlygeek/llama-swap/issues/621#issue-4185212174) · edited by [mostlygeek](https://github.com/mostlygeek)
Owner

Big News! llama-swap has been acquired by Elethink Systems, Inc. It’s time to follow up with the todo items:

    Replace the releases and docker images with a final free version that is limited to 3 models.
    Inject a “free version, upgrade to subscription” message in ever assistant message reminding the user the value of upgrading
    move source code to private repo and change this repo to be documentation and issues only
    Update README with awesome new monthly subscription plans! $$$
    Update README with new available source licencing

Thanks for all the great community support. Looking forward to taking the project to the next level. c h a − c h i n g

yadirhb · 2026-04-02T02:23:55Z

@yadirhb I see you were still very actively committing to this PR until 2 days ago. Is it now ready to be merged?

Nah, I have to review some issues with the cache since last builds I saw the cache is working great and the process takes less than 3 mins, however I see no compilation going on and that concerns me. I'll keep updating here and soon will complete.

ich777 · 2026-04-02T06:41:09Z

I think we better remove llama-swap

621

[mostlygeek](https://github.com/mostlygeek)
opened [6 hours ago](https://github.com/mostlygeek/llama-swap/issues/621#issue-4185212174) · edited by [mostlygeek](https://github.com/mostlygeek)
Owner

Big News! llama-swap has been acquired by Elethink Systems, Inc. It’s time to follow up with the todo items:

    Replace the releases and docker images with a final free version that is limited to 3 models.
    Inject a “free version, upgrade to subscription” message in ever assistant message reminding the user the value of upgrading
    move source code to private repo and change this repo to be documentation and issues only
    Update README with awesome new monthly subscription plans! $$$
    Update README with new available source licencing

Thanks for all the great community support. Looking forward to taking the project to the next level. c h a − c h i n g

I think that you missed that the date yesterday was the 1st of April. Issue is already closed as April fools, however I think this was a horrible joke since it was somewhat hard to spot that it was a joke in these times.

HP Prodesk and others added 13 commits February 16, 2026 23:35

ci: implement build matrix for CUDA/CPU containers with dynamic tagging

3e7ed29

fix: Updated Docker images/build-container.yml

e536907

fix: Updated the documentation about Docker

b195492

fix: Set Arch for 3090s

d91ca1a

Merge branch 'ikawrakow:main' into main

860ff29

fix: Updated build step name.

06e6b6a

fix: Set target ARCH as a variable

95bb917

feat: Added cleanup step

c891df3

feat: Added docker-bake and updated workflow

5c9061f

fix: Issue with REPO_OWNER variable

d919c9d

fix: Updated workflow to solve errors

118fa91

fix: Updated branch format

a403e99

fix: Wrong naming

be24cc7

This was referenced Feb 17, 2026

devops: fix llama-server-cuda Dockerfile #1224

Merged

Bug: /app/src/llama-sampling.cpp:733: GGML_ASSERT(iter != probs.end()) failed #1282

Closed

Merge branch 'ikawrakow:main' into main

ce574cb

Merge branch 'ikawrakow:main' into main

90da4f4

Merge branch 'ikawrakow:main' into main

910c8b7

chulucninh09 reviewed Feb 22, 2026

View reviewed changes

yadirhb added 7 commits March 11, 2026 16:04

Merge branch 'ikawrakow:main' into main

a4f5e99

Update docker-bake.hcl

6b969fa

Update build-container.yml

7701845

Update ik_llama-cuda.Containerfile

eff7935

Update ik_llama-cpu.Containerfile

041854a

Update docker-bake.hcl

0aa63f8

Update build-container.yml

4172322

yadirhb added 22 commits March 27, 2026 21:55

fix: fetch full git history in Dockerfile for accurate BUILD_NUMBER

18f10c6

- Add git fetch --unshallow to get complete commit history during build - This ensures build-info.cpp is generated with correct LLAMA_BUILD_NUMBER

chore: update GitHub Actions to latest versions for Node.js 24 compat…

1060ba6

…ibility - docker/setup-buildx-action@v3 -> v4 - docker/login-action@v3 -> v4

chore: update all GitHub Actions to Node.js 24 compatible versions

4df6ff0

- actions/checkout@v4 -> v6 - docker/setup-buildx-action@v3 -> v5 - docker/login-action@v3 -> v6 - docker/bake-action@v5 -> v7

fix: use CI-passed BUILD_NUMBER and LLAMA_COMMIT in Dockerfile

f8abf34

- Add BUILD_NUMBER and LLAMA_COMMIT as build args - Fall back to git commands if not provided - Pass values explicitly to cmake for accurate build info

fix: pass BUILD_NUMBER and LLAMA_COMMIT as Docker build args

3ee1a14

- Add BUILD_NUMBER and LLAMA_COMMIT to docker bake args - These will be used by the Containerfile for accurate build info

fix: revert docker actions to v4 (latest available versions)

b8ad58a

fix: calculate BUILD_NUMBER and LLAMA_COMMIT directly in Containerfile

24836c5

- Removed ARG defaults since we calculate from git during build - Use git rev-list --count HEAD and git rev-parse for accurate version info - Falls back to 0/unknown if git commands fail

feat: calculate BUILD_NUMBER and LLAMA_COMMIT in Containerfiles

0089016

- Add git-based version calculation in both CPU and CUDA Containerfiles - Remove .git bind mount (git is copied with COPY .) - Pass build info to CMake for accurate llama-server --version output

feat: calculate BUILD_NUMBER and LLAMA_COMMIT in Containerfiles

885d967

- Add git-based version calculation using git rev-list and git rev-parse - Copy .git directory separately to ensure git commands work during build - Pass build info to CMake for accurate llama-server --version output

fix: cache improvements for CUDA and CPU builds

3e4e6db

fix: "/.git": not found

47ea4e4

fix: Unnecessary mv llama-swap

4ca3b37

fix: Remove BUILD_NUMBER and LLAMA_COMMIT from docker file, calculate…

73dd7c5

…d by cmake proc

fix: remove .git from dockerignore for local and CI builds

985aa8c

- Enables cmake to access .git directory during Docker build - Required for version calculation in llama-server binary - GitHub Actions uses explicit mount via bake action set parameter

fix: Remove mounts key from Build and Push step in gh workflow

c8e75e5

ci: add .git verification step before build

8a2d388

ci: remove broken cache pruning step

eee8ba4

ci: remove broken prune-cache job

11e0a65

- Remove prune-cache job that was failing due to missing .git directory - The job required a checkout step and the cache pruning logic was non-critical

chore: Removed step for Verifying .git existance in GH workflow

39442a5

Merge branch 'ikawrakow:main' into main

0462a5b

fix: ensure build always proceeds even if git switch fails

98484c3

- Add '|| true' to git switch command so build continues on failure - This prevents the entire RUN step from failing when CUSTOM_COMMIT is invalid

yadirhb marked this pull request as ready for review March 29, 2026 21:38

Merge branch 'ikawrakow:main' into main

f589b59

yadirhb marked this pull request as draft April 2, 2026 02:21

Merge branch 'ikawrakow:main' into main

cc80658

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added workflow to build container images#1279

Added workflow to build container images#1279
yadirhb wants to merge 68 commits intoikawrakow:mainfrom
sourceupcode:main

yadirhb commented Feb 17, 2026 •

edited

Loading

Uh oh!

ikawrakow commented Feb 18, 2026

Uh oh!

mcm007 commented Feb 19, 2026

Uh oh!

yadirhb commented Feb 20, 2026

Uh oh!

yadirhb commented Feb 20, 2026 •

edited

Loading

Uh oh!

chulucninh09 Feb 22, 2026

Uh oh!

mcm007 Feb 22, 2026

Uh oh!

chulucninh09 Feb 23, 2026 •

edited

Loading

Uh oh!

gdevenyi Mar 11, 2026

Uh oh!

ikawrakow commented Apr 1, 2026

Uh oh!

mcm007 commented Apr 1, 2026

Uh oh!

yadirhb commented Apr 2, 2026

Uh oh!

ich777 commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

yadirhb commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Acknowledgement

Some facts:

Target CUDA ARCH

Overview

Uh oh!

ikawrakow commented Feb 18, 2026

Uh oh!

mcm007 commented Feb 19, 2026

Uh oh!

yadirhb commented Feb 20, 2026

Uh oh!

yadirhb commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chulucninh09 Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

mcm007 Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

chulucninh09 Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gdevenyi Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

ikawrakow commented Apr 1, 2026

Uh oh!

mcm007 commented Apr 1, 2026

Uh oh!

yadirhb commented Apr 2, 2026

Uh oh!

ich777 commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

yadirhb commented Feb 17, 2026 •

edited

Loading

yadirhb commented Feb 20, 2026 •

edited

Loading

chulucninh09 Feb 23, 2026 •

edited

Loading