Skip to content

Added workflow to build container images#1279

Draft
yadirhb wants to merge 68 commits intoikawrakow:mainfrom
sourceupcode:main
Draft

Added workflow to build container images#1279
yadirhb wants to merge 68 commits intoikawrakow:mainfrom
sourceupcode:main

Conversation

@yadirhb
Copy link
Copy Markdown

@yadirhb yadirhb commented Feb 17, 2026

Acknowledgement

First of all, this project is amazing and I think more people should be able to use it! I've been trying this project for a while, in different occasions honestly at par with main llama.cpp and others. I have a local k3s cluster with 3x3090s and in the last few weeks I finally got llama.cpp (main) deployed in Kubernetes and running a few models, such as GLM-4.7-Flash, Qwen3-Coder-Next and Kimi-Linear. Now I am working to deploy ik_llama.cpp to run models for agentic use cases.

Before Kubernetes I was always using LXCs in Proxmox and I built some scripts to automatically pull from github latest, build and update the binaries on schedule at 3:00 am.

I know this project excels in efficiency and optimization but it's a pain to keep locally building this one and having main llama.cpp on a container already.

Some facts:

  • Llama-cpp has the smallest image size for all alternatives I ran in my cluster (around 4.7 Gb). Sglang, vllm and tabbyAPI all have huge image size (around 13 Gb).
  • Except for tabbyAPI - exl3, the other alternatives are very slow to load. Llama-cpp has the best times.
  • The combo llama-cpp + llama-swap is good enough for tasks such as agentic coding and so on, but the problem is high throughput ... if only that could be figured out, then this could be very much a real contender to anything production ready out there... (me dreaming)

Where I'm going with this is, this project is amazing and I want more people to be able to use it. So, this PR brings a github workflow to build and store container images in a package, publicly accessible ghcr.io/ikawrakow/ik-llama-cpp:${TAGS}, where possible tags (atm) are:

  • ghcr.io/ikawrakow/ik-llama-cpp:cpu-full
  • ghcr.io/ikawrakow/ik-llama-cpp:cpu-swap
  • ghcr.io/ikawrakow/ik-llama-cpp:cu12-full - Uses CUDA version 12.6.2 the same from ./docker/*.Containerfile
  • ghcr.io/ikawrakow/ik-llama-cpp:cu13-full - I added in the workflow two variantes since I have my CUDA version already up to 13.x
  • ghcr.io/ikawrakow/ik-llama-cpp:cu12-swap - Only includes llama-server and llama-swap
  • ghcr.io/ikawrakow/ik-llama-cpp:cu13-swap - Same as above but CUDA 13

Github Container Registry is free up to certain level, so why not use it?!!!

Target CUDA ARCH

I set the workflow to build for CUDA_DOCKER_ARCH=86;90 as discussed here , but this can be updated as needed.

Overview

PR-01
Pr2

Hopefully we can go from building latest ik_llama.cpp to have Github Actions building for us and just pull the container images.

@ikawrakow
Copy link
Copy Markdown
Owner

I know nothing about containers. If somebody more knowledgable is paying attention, can you please help review? Thank you.

@mcm007
Copy link
Copy Markdown
Contributor

mcm007 commented Feb 19, 2026

For me (without actual experience with GitHub workflow) looks generally OK, only few suggestions:

  1. Maybe change the trigger to run once per day instead of running after each push.
    The build process of this project is long and resource intensive.

  2. actions/checkout@v4 is not strictly required (by default, this action uses the Git context, so you don't need to use the actions/checkout action to check out the repository as this will be done directly by BuildKit.)

  3. Not sure if there is a more trustworthy cleanup method, instead of vlaurin/action-ghcr-prune which has the last commit 2 years ago.

Note: Due to "At the start of each workflow job, GitHub automatically creates a unique GITHUB_TOKEN secret to use in your workflow. You can use the GITHUB_TOKEN to authenticate in the workflow job [...]" some extra care will be needed in the future as well to keep it safe, like:

  • Limit/avoid usage of 3rd party actions

  • "Use the permissions key in your workflow file to modify permissions for the GITHUB_TOKEN" for entire workflow like this PR is already doing:

permissions:
  contents: read
  packages: write
  actions: read

@yadirhb
Copy link
Copy Markdown
Author

yadirhb commented Feb 20, 2026

For me (without actual experience with GitHub workflow) looks generally OK, only few suggestions:

  1. Maybe change the trigger to run once per day instead of running after each push.
    The build process of this project is long and resource intensive.
  2. actions/checkout@v4 is not strictly required (by default, this action uses the Git context, so you don't need to use the actions/checkout action to check out the repository as this will be done directly by BuildKit.)
  3. Not sure if there is a more trustworthy cleanup method, instead of vlaurin/action-ghcr-prune which has the last commit 2 years ago.

Note: Due to "At the start of each workflow job, GitHub automatically creates a unique GITHUB_TOKEN secret to use in your workflow. You can use the GITHUB_TOKEN to authenticate in the workflow job [...]" some extra care will be needed in the future as well to keep it safe, like:

  • Limit/avoid usage of 3rd party actions
  • "Use the permissions key in your workflow file to modify permissions for the GITHUB_TOKEN" for entire workflow like this PR is already doing:
permissions:
  contents: read
  packages: write
  actions: read

I have some questions on these suggestions:

  1. How frequent are changes pushed to the main branch? (as-is right now this workflow builds in ~ 1.5 hr so we can build multiple times a day. Furthermore, the current workflows have concurrency configuration set so it cancel previous jobs if when triggered there is another job running. This last part is less ideal if you really want to have a build per change, otherwise you ensure to build always latest. (8 times a day seems a lot)

  2. Nice on this one, I'll check and update as needed.

  3. This workflow already uses GITHUB_TOKEN for authentication and relies on permissions, but thanks for sharing a more granular model for permissions: per job instead of the whole workflow when using third-party actions.

As for thirdparty actions, I agree except this one is good for the job is doing. It's to remove untagged container images. On every build we update tags such as: cpu-full, cpu-swap, cu12-full, cu12swa... so I just run a cleanup to remove untagged images. Still for each image we create a commit-id based tag suffix.

I'll take all these notes and update the PR. Thank you for your feedback!

@yadirhb
Copy link
Copy Markdown
Author

yadirhb commented Feb 20, 2026

In the meantime, docker images are available here: https://github.com/sourceupcode/ik_llama.cpp/pkgs/container/ik-llama-cpp


RUN apt-get update && apt-get install -yq build-essential git libcurl4-openssl-dev curl libgomp1 cmake
RUN git clone https://github.com/ikawrakow/ik_llama.cpp.git /app
RUN apt-get update && apt-get install -yq build-essential libcurl4-openssl-dev curl libgomp1 cmake
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use ccache and docker cache here to speedup the build?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is now, with stages in the Containerfile and chained with &&, it builds only once then is reused on the following stages.

ccache, per my understanding, is effective when building again on the same machine, which is not the case with GitHub Workflow, each time on a different runner/machine.
Maybe when building manually on the local machine it worth trying by mapping a persistent volume.

Copy link
Copy Markdown

@chulucninh09 chulucninh09 Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the nature of serverless CI/CD is each time it run in different machine. However, since we are using Docker, we can use docker buildx driver to do inline cache or registry cache. We just need to cache ccache folder for reuse next time.

Since the commits are incremental, this will speed up the build pretty well.

Check out the docs here
https://docs.docker.com/build/cache/backends/registry/

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few pre-made ccache actions availbale https://github.com/marketplace?query=ccache&type=actions

yadirhb added 22 commits March 27, 2026 21:55
- Add git fetch --unshallow to get complete commit history during build
- This ensures build-info.cpp is generated with correct LLAMA_BUILD_NUMBER
…ibility

- docker/setup-buildx-action@v3 -> v4
- docker/login-action@v3 -> v4
- actions/checkout@v4 -> v6
- docker/setup-buildx-action@v3 -> v5
- docker/login-action@v3 -> v6
- docker/bake-action@v5 -> v7
- Add BUILD_NUMBER and LLAMA_COMMIT as build args
- Fall back to git commands if not provided
- Pass values explicitly to cmake for accurate build info
- Add BUILD_NUMBER and LLAMA_COMMIT to docker bake args
- These will be used by the Containerfile for accurate build info
- Removed ARG defaults since we calculate from git during build
- Use git rev-list --count HEAD and git rev-parse for accurate version info
- Falls back to 0/unknown if git commands fail
- Add git-based version calculation in both CPU and CUDA Containerfiles
- Remove .git bind mount (git is copied with COPY .)
- Pass build info to CMake for accurate llama-server --version output
- Add git-based version calculation using git rev-list and git rev-parse
- Copy .git directory separately to ensure git commands work during build
- Pass build info to CMake for accurate llama-server --version output
- Enables cmake to access .git directory during Docker build
- Required for version calculation in llama-server binary
- GitHub Actions uses explicit mount via bake action set parameter
…ependency

- Remove --mount=type=bind,source=.git,target=.git from both Containerfiles
- Replace COPY . . with git clone for cleaner build context
- Add CUSTOM_COMMIT ARG for optional custom commit switching
- Standardize ARG/ENV ordering and comment formatting across CPU/CUDA variants
- Install ca-certificates before git clone to fix SSL verification issues
- Rename 'Structured artifact collection' to 'Collect build artifacts'
- Remove prune-cache job that was failing due to missing .git directory
- The job required a checkout step and the cache pruning logic was non-critical
- Add '|| true' to git switch command so build continues on failure
- This prevents the entire RUN step from failing when CUSTOM_COMMIT is invalid
@yadirhb yadirhb marked this pull request as ready for review March 29, 2026 21:38
@ikawrakow
Copy link
Copy Markdown
Owner

@yadirhb I see you were still very actively committing to this PR until 2 days ago. Is it now ready to be merged?

@mcm007
Copy link
Copy Markdown
Contributor

mcm007 commented Apr 1, 2026

I think we better remove llama-swap

621

[mostlygeek](https://github.com/mostlygeek)
opened [6 hours ago](https://github.com/mostlygeek/llama-swap/issues/621#issue-4185212174) · edited by [mostlygeek](https://github.com/mostlygeek)
Owner

Big News! llama-swap has been acquired by Elethink Systems, Inc. It’s time to follow up with the todo items:

    Replace the releases and docker images with a final free version that is limited to 3 models.
    Inject a “free version, upgrade to subscription” message in ever assistant message reminding the user the value of upgrading
    move source code to private repo and change this repo to be documentation and issues only
    Update README with awesome new monthly subscription plans! $$$
    Update README with new available source licencing

Thanks for all the great community support. Looking forward to taking the project to the next level. c h a − c h i n g

@yadirhb yadirhb marked this pull request as draft April 2, 2026 02:21
@yadirhb
Copy link
Copy Markdown
Author

yadirhb commented Apr 2, 2026

@yadirhb I see you were still very actively committing to this PR until 2 days ago. Is it now ready to be merged?

Nah, I have to review some issues with the cache since last builds I saw the cache is working great and the process takes less than 3 mins, however I see no compilation going on and that concerns me. I'll keep updating here and soon will complete.

@ich777
Copy link
Copy Markdown

ich777 commented Apr 2, 2026

I think we better remove llama-swap

621

[mostlygeek](https://github.com/mostlygeek)
opened [6 hours ago](https://github.com/mostlygeek/llama-swap/issues/621#issue-4185212174) · edited by [mostlygeek](https://github.com/mostlygeek)
Owner

Big News! llama-swap has been acquired by Elethink Systems, Inc. It’s time to follow up with the todo items:

    Replace the releases and docker images with a final free version that is limited to 3 models.
    Inject a “free version, upgrade to subscription” message in ever assistant message reminding the user the value of upgrading
    move source code to private repo and change this repo to be documentation and issues only
    Update README with awesome new monthly subscription plans! $$$
    Update README with new available source licencing

Thanks for all the great community support. Looking forward to taking the project to the next level. c h a − c h i n g

I think that you missed that the date yesterday was the 1st of April. Issue is already closed as April fools, however I think this was a horrible joke since it was somewhat hard to spot that it was a joke in these times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants