Added workflow to build container images#1279
Added workflow to build container images#1279yadirhb wants to merge 68 commits intoikawrakow:mainfrom
Conversation
|
I know nothing about containers. If somebody more knowledgable is paying attention, can you please help review? Thank you. |
|
For me (without actual experience with GitHub workflow) looks generally OK, only few suggestions:
Note: Due to "At the start of each workflow job, GitHub automatically creates a unique
|
I have some questions on these suggestions:
As for thirdparty actions, I agree except this one is good for the job is doing. It's to remove untagged container images. On every build we update tags such as: I'll take all these notes and update the PR. Thank you for your feedback! |
|
In the meantime, docker images are available here: https://github.com/sourceupcode/ik_llama.cpp/pkgs/container/ik-llama-cpp |
docker/ik_llama-cpu.Containerfile
Outdated
|
|
||
| RUN apt-get update && apt-get install -yq build-essential git libcurl4-openssl-dev curl libgomp1 cmake | ||
| RUN git clone https://github.com/ikawrakow/ik_llama.cpp.git /app | ||
| RUN apt-get update && apt-get install -yq build-essential libcurl4-openssl-dev curl libgomp1 cmake |
There was a problem hiding this comment.
we can use ccache and docker cache here to speedup the build?
There was a problem hiding this comment.
As it is now, with stages in the Containerfile and chained with &&, it builds only once then is reused on the following stages.
ccache, per my understanding, is effective when building again on the same machine, which is not the case with GitHub Workflow, each time on a different runner/machine.
Maybe when building manually on the local machine it worth trying by mapping a persistent volume.
There was a problem hiding this comment.
Yes, the nature of serverless CI/CD is each time it run in different machine. However, since we are using Docker, we can use docker buildx driver to do inline cache or registry cache. We just need to cache ccache folder for reuse next time.
Since the commits are incremental, this will speed up the build pretty well.
Check out the docs here
https://docs.docker.com/build/cache/backends/registry/
There was a problem hiding this comment.
There are a few pre-made ccache actions availbale https://github.com/marketplace?query=ccache&type=actions
- Add git fetch --unshallow to get complete commit history during build - This ensures build-info.cpp is generated with correct LLAMA_BUILD_NUMBER
…ibility - docker/setup-buildx-action@v3 -> v4 - docker/login-action@v3 -> v4
- actions/checkout@v4 -> v6 - docker/setup-buildx-action@v3 -> v5 - docker/login-action@v3 -> v6 - docker/bake-action@v5 -> v7
- Add BUILD_NUMBER and LLAMA_COMMIT as build args - Fall back to git commands if not provided - Pass values explicitly to cmake for accurate build info
- Add BUILD_NUMBER and LLAMA_COMMIT to docker bake args - These will be used by the Containerfile for accurate build info
- Removed ARG defaults since we calculate from git during build - Use git rev-list --count HEAD and git rev-parse for accurate version info - Falls back to 0/unknown if git commands fail
- Add git-based version calculation in both CPU and CUDA Containerfiles - Remove .git bind mount (git is copied with COPY .) - Pass build info to CMake for accurate llama-server --version output
- Add git-based version calculation using git rev-list and git rev-parse - Copy .git directory separately to ensure git commands work during build - Pass build info to CMake for accurate llama-server --version output
- Enables cmake to access .git directory during Docker build - Required for version calculation in llama-server binary - GitHub Actions uses explicit mount via bake action set parameter
…ependency - Remove --mount=type=bind,source=.git,target=.git from both Containerfiles - Replace COPY . . with git clone for cleaner build context - Add CUSTOM_COMMIT ARG for optional custom commit switching - Standardize ARG/ENV ordering and comment formatting across CPU/CUDA variants - Install ca-certificates before git clone to fix SSL verification issues - Rename 'Structured artifact collection' to 'Collect build artifacts'
- Remove prune-cache job that was failing due to missing .git directory - The job required a checkout step and the cache pruning logic was non-critical
- Add '|| true' to git switch command so build continues on failure - This prevents the entire RUN step from failing when CUSTOM_COMMIT is invalid
|
@yadirhb I see you were still very actively committing to this PR until 2 days ago. Is it now ready to be merged? |
|
I think we better remove |
Nah, I have to review some issues with the cache since last builds I saw the cache is working great and the process takes less than 3 mins, however I see no compilation going on and that concerns me. I'll keep updating here and soon will complete. |
I think that you missed that the date yesterday was the 1st of April. Issue is already closed as April fools, however I think this was a horrible joke since it was somewhat hard to spot that it was a joke in these times. |
Acknowledgement
First of all, this project is amazing and I think more people should be able to use it! I've been trying this project for a while, in different occasions honestly at par with main
llama.cppand others. I have a localk3scluster with 3x3090s and in the last few weeks I finally gotllama.cpp(main) deployed in Kubernetes and running a few models, such asGLM-4.7-Flash,Qwen3-Coder-NextandKimi-Linear. Now I am working to deployik_llama.cppto run models for agentic use cases.Before Kubernetes I was always using LXCs in Proxmox and I built some scripts to automatically pull from github latest, build and update the binaries on schedule at 3:00 am.
I know this project excels in efficiency and optimization but it's a pain to keep locally building this one and having main
llama.cppon a container already.Some facts:
Llama-cpphas the smallest image size for all alternatives I ran in my cluster (around 4.7 Gb).Sglang,vllmandtabbyAPIall have huge image size (around 13 Gb).tabbyAPI - exl3, the other alternatives are very slow to load. Llama-cpp has the best times.lama-cpp+llama-swapis good enough for tasks such as agentic coding and so on, but the problem is high throughput ... if only that could be figured out, then this could be very much a real contender to anything production ready out there... (me dreaming)Where I'm going with this is, this project is amazing and I want more people to be able to use it. So, this PR brings a github workflow to build and store container images in a package, publicly accessible
ghcr.io/ikawrakow/ik-llama-cpp:${TAGS}, where possible tags (atm) are:ghcr.io/ikawrakow/ik-llama-cpp:cpu-fullghcr.io/ikawrakow/ik-llama-cpp:cpu-swapghcr.io/ikawrakow/ik-llama-cpp:cu12-full- Uses CUDA version 12.6.2 the same from ./docker/*.Containerfileghcr.io/ikawrakow/ik-llama-cpp:cu13-full- I added in the workflow two variantes since I have my CUDA version already up to 13.xghcr.io/ikawrakow/ik-llama-cpp:cu12-swap- Only includes llama-server and llama-swapghcr.io/ikawrakow/ik-llama-cpp:cu13-swap- Same as above but CUDA 13Github Container Registry is free up to certain level, so why not use it?!!!
Target CUDA ARCH
I set the workflow to build for
CUDA_DOCKER_ARCH=86;90as discussed here , but this can be updated as needed.Overview
Hopefully we can go from building latest
ik_llama.cppto have Github Actions building for us and just pull the container images.