Conversation
Add "runner" images that contain only the inference runtime (LocalAI + backends) without model weights baked in. Users pass a model reference at `docker run` time, and the model is downloaded at container startup. Runner mode is detected implicitly: when an aikitfile specifies `backends` but no `models`, the build pipeline skips model downloads, installs runtime download dependencies (curl, huggingface-cli), and injects an entrypoint script that handles model download + LocalAI startup. Supported runners: - llama-cpp-cpu: CPU-only llama-cpp backend - llama-cpp-cuda: CUDA-accelerated llama-cpp backend - diffusers-cuda: CUDA diffusers backend - vllm-cuda: CUDA vLLM backend Usage: docker run -p 8080:8080 <runner-image> unsloth/gemma-3-1b-it-GGUF docker run --gpus all -p 8080:8080 <runner-image> Qwen/Qwen2.5-0.5B-Instruct Changes: - Refactor copyModels() to extract writeConfig() for reuse - Add runner build logic (runner.go): isRunnerMode(), entrypoint script generation, dependency installation - Update NewImageConfig() for runner entrypoint and labels - Add runner aikitfile definitions (runners/) - Add CI workflows: docker-test-runner (push/PR) and docker-test-runner-gpu (manual, self-hosted GPU) - Add unit tests for runner mode detection and script generation
There was a problem hiding this comment.
Pull request overview
Adds a new “runner image” mode to build runtime-only AIKit images (LocalAI + backend dependencies) without bundling model weights, so users can pass a model reference at docker run time and have it downloaded on container startup.
Changes:
- Introduces runner-mode detection (backends present, models absent) and a generated
/usr/local/bin/aikit-runnerentrypoint script that downloads/models/configures at runtime. - Refactors config writing out of
copyModels()intowriteConfig()and wires runner-mode build flow to skip model copying while still installing runner dependencies. - Adds runner aikitfile definitions plus CPU/GPU GitHub Actions workflows to build and validate runner images, along with unit tests.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
runners/llama-cpp-cpu.yaml |
New runner aikitfile for CPU llama-cpp runtime-only image. |
runners/llama-cpp-cuda.yaml |
New runner aikitfile for CUDA llama-cpp runtime-only image. |
runners/diffusers-cuda.yaml |
New runner aikitfile for CUDA diffusers runtime-only image. |
runners/vllm-cuda.yaml |
New runner aikitfile for CUDA vLLM runtime-only image. |
pkg/build/build_test.go |
Extends inference config validation tests to accept runner-mode configs (backends without models). |
pkg/aikit2llb/inference/runner.go |
Implements runner-mode detection, dependency installation, entrypoint install, and script generation for runtime downloads/config. |
pkg/aikit2llb/inference/runner_test.go |
Adds unit tests for runner-mode detection and generated script content for supported backends. |
pkg/aikit2llb/inference/image.go |
Sets runner entrypoint and labels in image config when runner mode is active. |
pkg/aikit2llb/inference/convert.go |
Adds runner-mode build path, and factors config creation into reusable writeConfig(). |
.github/workflows/test-docker-runner.yaml |
New CI workflow to build/test the CPU runner image and validate chat completions. |
.github/workflows/test-docker-runner-gpu.yaml |
New manual CI workflow to build/test GPU runner images on self-hosted GPU runners. |
Comments suppressed due to low confidence (1)
.github/workflows/test-docker-runner-gpu.yaml:92
- Similarly,
jq '.data[0].url'will outputnullon failures and the-zcheck won’t catch it. Usejq -eto assert.data[0].urlis a non-null string (and/or validate.data|length>0) so the diffusers runner test fails when image generation fails.
url=$(echo "$result" | jq '.data[0].url')
if [ -z "$url" ]; then
exit 1
fi
pkg/aikit2llb/inference/runner.go
Outdated
| // Some backends (diffusers/vllm) already install python, but llama-cpp does not, | ||
| // so we always install the minimal set here. | ||
| s = s.Run( | ||
| utils.Sh("apt-get update && apt-get install --no-install-recommends -y curl python3 python3-pip && pip install --break-system-packages huggingface-hub[cli] 2>/dev/null || pip install huggingface-hub[cli] && apt-get clean"), |
There was a problem hiding this comment.
The dependency install shell chain mixes && and || without grouping. As written, an apt-get failure can be masked because the || pip install ... branch may run and the overall command can still exit 0 after apt-get clean. Consider grouping the pip fallback only (e.g., wrap the two pip install attempts in parentheses) so apt-get failures still fail the build.
| utils.Sh("apt-get update && apt-get install --no-install-recommends -y curl python3 python3-pip && pip install --break-system-packages huggingface-hub[cli] 2>/dev/null || pip install huggingface-hub[cli] && apt-get clean"), | |
| utils.Sh("apt-get update && apt-get install --no-install-recommends -y curl python3 python3-pip && (pip install --break-system-packages huggingface-hub[cli] 2>/dev/null || pip install huggingface-hub[cli]) && apt-get clean"), |
pkg/aikit2llb/inference/runner.go
Outdated
| s = s.Run( | ||
| utils.Sh("apt-get update && apt-get install --no-install-recommends -y curl python3 python3-pip && pip install --break-system-packages huggingface-hub[cli] 2>/dev/null || pip install huggingface-hub[cli] && apt-get clean"), |
There was a problem hiding this comment.
The runner dependency install step unconditionally uses apt-get, but utils.AppleSiliconBase is Fedora-based (no apt). Runner mode with runtime: applesilicon will fail at build time. Consider branching on runtime (or detecting available package manager) and using dnf on Apple Silicon, or explicitly disallowing runner mode for that runtime in validation with a clear error.
| s = s.Run( | |
| utils.Sh("apt-get update && apt-get install --no-install-recommends -y curl python3 python3-pip && pip install --break-system-packages huggingface-hub[cli] 2>/dev/null || pip install huggingface-hub[cli] && apt-get clean"), | |
| installerCmd := `if command -v apt-get >/dev/null 2>&1; then \ | |
| apt-get update && apt-get install --no-install-recommends -y curl python3 python3-pip && \ | |
| pip install --break-system-packages huggingface-hub[cli] 2>/dev/null || pip install huggingface-hub[cli] && \ | |
| apt-get clean; \ | |
| elif command -v dnf >/dev/null 2>&1; then \ | |
| dnf install -y curl python3 python3-pip && \ | |
| pip install --break-system-packages huggingface-hub[cli] 2>/dev/null || pip install huggingface-hub[cli] && \ | |
| dnf clean all; \ | |
| else \ | |
| echo "No supported package manager (apt-get or dnf) found; cannot install runner dependencies." >&2; \ | |
| exit 1; \ | |
| fi` | |
| s = s.Run( | |
| utils.Sh(installerCmd), |
pkg/aikit2llb/inference/runner.go
Outdated
| `) | ||
| } | ||
|
|
||
| sb.WriteString(`LOCAL_AI_ARGS+=("${EXTRA_ARGS[@]+"${EXTRA_ARGS[@]}"}") |
There was a problem hiding this comment.
LOCAL_AI_ARGS+=("${EXTRA_ARGS[@]+"${EXTRA_ARGS[@]}"}") does not append extra args correctly: the nested quotes become literal text, so multiple extra args can collapse into a single element containing quotes/spaces. With set -u, it’s safer to append conditionally based on array length (e.g., only append when ${#EXTRA_ARGS[@]} > 0) and then use LOCAL_AI_ARGS+=("${EXTRA_ARGS[@]}").
| sb.WriteString(`LOCAL_AI_ARGS+=("${EXTRA_ARGS[@]+"${EXTRA_ARGS[@]}"}") | |
| sb.WriteString(`if ((${#EXTRA_ARGS[@]})); then | |
| LOCAL_AI_ARGS+=("${EXTRA_ARGS[@]}") | |
| fi |
| // writeConfig writes the /config.yaml file to the image when c.Config is set. | ||
| func writeConfig(c *config.InferenceConfig, base llb.State, s llb.State, platform specs.Platform) (llb.State, llb.State) { | ||
| savedState := s | ||
| if c.Config != "" { | ||
| s = s.Run(utils.Shf("mkdir -p /configuration && echo -n \"%s\" > /config.yaml", c.Config), | ||
| llb.WithCustomName(fmt.Sprintf("Creating config for platform %s/%s", platform.OS, platform.Architecture))).Root() | ||
| } | ||
| diff := llb.Diff(savedState, s) | ||
| merge := llb.Merge([]llb.State{base, diff}) | ||
| return s, merge |
There was a problem hiding this comment.
writeConfig builds a shell command with echo -n "%s" using the raw YAML string. This will break for configs containing quotes/newlines and also allows shell injection during the image build. Prefer writing the file via llb.Mkfile (or a here-doc that avoids interpolation) so the config content is treated as data, not shell syntax.
| echo $result | ||
|
|
||
| choices=$(echo "$result" | jq '.choices') | ||
| if [ -z "$choices" ]; then | ||
| exit 1 | ||
| fi | ||
|
|
There was a problem hiding this comment.
This workflow’s validation can false-pass: jq '.choices' returns the string null on error responses, which is non-empty so [ -z "$choices" ] won’t fail. Use jq -e to assert a non-null, non-empty array (and/or explicitly fail if .error is present) so CI reliably detects runner failures.
| echo $result | |
| choices=$(echo "$result" | jq '.choices') | |
| if [ -z "$choices" ]; then | |
| exit 1 | |
| fi | |
| echo "$result" | |
| echo "$result" | jq -e ' | |
| if (.error? != null) then | |
| error("error field present in response") | |
| elif (.choices | type != "array" or (.choices | length) == 0) then | |
| error("choices must be a non-empty array") | |
| else | |
| . | |
| end | |
| ' > /dev/null |
| choices=$(echo "$result" | jq '.choices') | ||
| if [ -z "$choices" ]; then | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
The JSON assertion here can false-pass: jq '.choices' yields null (non-empty string) if the API returns an error object, so the [ -z "$choices" ] check won’t catch failures. Consider jq -e '.choices and (.choices|type=="array") and (.choices|length>0)' (and fail on .error) to make the GPU runner test meaningful.
This issue also appears on line 89 of the same file.
The previous version (v2.1.6) was built with Go 1.24, which is lower than the project's Go 1.26 target, causing lint CI to fail with "can't load config" error.
- Fix pip install fallback grouping: wrap the two pip attempts in parentheses so apt-get failures are not masked by the || branch - Disallow runner mode on Apple Silicon in validation since the Fedora-based base image lacks apt-get - Fix EXTRA_ARGS appending: use conditional length check instead of nested parameter expansion to correctly handle multiple args - Use llb.Mkfile for writeConfig instead of shell echo to avoid shell injection and handle quotes/newlines safely - Use jq -e with explicit error/choices validation in CI workflows to prevent false-pass on null/error API responses
Publishes runner images to ghcr.io/kaito-project/aikit/runners/ on tag push (matching the main release workflow trigger). Builds all four runner aikitfiles in a matrix, pushes with semver + latest tags, and signs with cosign via GitHub OIDC. Images published: - ghcr.io/kaito-project/aikit/runners/llama-cpp-cpu (amd64 + arm64) - ghcr.io/kaito-project/aikit/runners/llama-cpp-cuda (amd64) - ghcr.io/kaito-project/aikit/runners/diffusers-cuda (amd64) - ghcr.io/kaito-project/aikit/runners/vllm-cuda (amd64)
LocalAI v3.12.1 introduced an init() function in capabilities.go that checks for /usr/local/cuda-12 directory existence before checking GPU vendor presence. Since aikit installs cuda-cudart-12-5 (creating that directory), LocalAI incorrectly selects the cuda12-llama-cpp backend on CPU-only hosts, which crashes with missing libcuda.so.1. Add a lightweight entrypoint wrapper script for CUDA images that detects actual NVIDIA GPU presence via lspci at container startup. When no GPU is found, it sets LOCALAI_FORCE_META_BACKEND_CAPABILITY=default to force LocalAI to use the cpu-llama-cpp backend instead. This is a workaround for an upstream LocalAI regression that broke the fix from mudler/LocalAI#6149.
- Gate wrapper entrypoint on amd64 only (matching install gating) to avoid missing binary on CUDA arm64 images - Respect existing LOCALAI_FORCE_META_BACKEND_CAPABILITY env var by exiting early when it is already set - Improve GPU detection: check /dev/nvidiactl first, then nvidia-smi, then lspci as final fallback (avoids false negatives when lspci cannot see PCI devices inside containers) - Add image_test.go with test matrix covering amd64/arm64, runner/ standard mode, and wrapper script content assertions
Summary
Closes #719
docker runtime and it is downloaded at container startupbackendsbut nomodelscopyModels()to extractwriteConfig()for reuse in runner pathrunner.go):isRunnerMode(), entrypoint script generation with backend-specific download logic (GGUF for llama-cpp, HF model config for diffusers/vllm), dependency installation (curl, huggingface-cli)NewImageConfig()to set/usr/local/bin/aikit-runnerentrypoint and add runner labelsllama-cpp-cpu,llama-cpp-cuda,diffusers-cuda,vllm-cudadocker-test-runnerworkflow (runs on push/PR, tests CPU runner) anddocker-test-runner-gpuworkflow (manual trigger, tests all GPU runners on self-hosted GPU)Usage:
Test plan
go test ./...— all existing + new unit tests passgolangci-lint run ./...— 0 lint issuesdocker-test-runnerCI workflow builds CPU runner and validates chat completionsdocker-test-runner-gpuCI workflow builds GPU runners and validates inference (manual trigger)docker buildx build -t runner-test:latest -f runners/llama-cpp-cpu.yaml .docker run -p 8080:8080 runner-test:latest unsloth/gemma-3-1b-it-GGUFdocker run -p 8080:8080 -v model-cache:/models runner-test:latest unsloth/gemma-3-1b-it-GGUF(skips download on second run)