Skip to content

Add runtime images without models#747

Open
sozercan wants to merge 6 commits intomainfrom
feat/runtime-images-without-models
Open

Add runtime images without models#747
sozercan wants to merge 6 commits intomainfrom
feat/runtime-images-without-models

Conversation

@sozercan
Copy link
Member

Summary

Closes #719

  • Add "runner" images that contain only the inference runtime (LocalAI + backends) without model weights — users pass a model reference at docker run time and it is downloaded at container startup
  • Runner mode detected implicitly when an aikitfile has backends but no models
  • Refactor copyModels() to extract writeConfig() for reuse in runner path
  • Add runner build logic (runner.go): isRunnerMode(), entrypoint script generation with backend-specific download logic (GGUF for llama-cpp, HF model config for diffusers/vllm), dependency installation (curl, huggingface-cli)
  • Update NewImageConfig() to set /usr/local/bin/aikit-runner entrypoint and add runner labels
  • Add 4 runner aikitfile definitions: llama-cpp-cpu, llama-cpp-cuda, diffusers-cuda, vllm-cuda
  • Add CI: docker-test-runner workflow (runs on push/PR, tests CPU runner) and docker-test-runner-gpu workflow (manual trigger, tests all GPU runners on self-hosted GPU)
  • Add unit tests for runner mode detection and script generation

Usage:

docker run -p 8080:8080 <runner-image> unsloth/gemma-3-1b-it-GGUF
docker run --gpus all -p 8080:8080 <cuda-runner-image> Qwen/Qwen2.5-0.5B-Instruct

Test plan

  • go test ./... — all existing + new unit tests pass
  • golangci-lint run ./... — 0 lint issues
  • docker-test-runner CI workflow builds CPU runner and validates chat completions
  • docker-test-runner-gpu CI workflow builds GPU runners and validates inference (manual trigger)
  • Build a runner image locally: docker buildx build -t runner-test:latest -f runners/llama-cpp-cpu.yaml .
  • Run with a small model: docker run -p 8080:8080 runner-test:latest unsloth/gemma-3-1b-it-GGUF
  • Verify volume caching: docker run -p 8080:8080 -v model-cache:/models runner-test:latest unsloth/gemma-3-1b-it-GGUF (skips download on second run)

Add "runner" images that contain only the inference runtime (LocalAI +
backends) without model weights baked in. Users pass a model reference
at `docker run` time, and the model is downloaded at container startup.

Runner mode is detected implicitly: when an aikitfile specifies
`backends` but no `models`, the build pipeline skips model downloads,
installs runtime download dependencies (curl, huggingface-cli), and
injects an entrypoint script that handles model download + LocalAI
startup.

Supported runners:
- llama-cpp-cpu: CPU-only llama-cpp backend
- llama-cpp-cuda: CUDA-accelerated llama-cpp backend
- diffusers-cuda: CUDA diffusers backend
- vllm-cuda: CUDA vLLM backend

Usage:
  docker run -p 8080:8080 <runner-image> unsloth/gemma-3-1b-it-GGUF
  docker run --gpus all -p 8080:8080 <runner-image> Qwen/Qwen2.5-0.5B-Instruct

Changes:
- Refactor copyModels() to extract writeConfig() for reuse
- Add runner build logic (runner.go): isRunnerMode(), entrypoint script
  generation, dependency installation
- Update NewImageConfig() for runner entrypoint and labels
- Add runner aikitfile definitions (runners/)
- Add CI workflows: docker-test-runner (push/PR) and
  docker-test-runner-gpu (manual, self-hosted GPU)
- Add unit tests for runner mode detection and script generation
Copilot AI review requested due to automatic review settings March 10, 2026 03:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “runner image” mode to build runtime-only AIKit images (LocalAI + backend dependencies) without bundling model weights, so users can pass a model reference at docker run time and have it downloaded on container startup.

Changes:

  • Introduces runner-mode detection (backends present, models absent) and a generated /usr/local/bin/aikit-runner entrypoint script that downloads/models/configures at runtime.
  • Refactors config writing out of copyModels() into writeConfig() and wires runner-mode build flow to skip model copying while still installing runner dependencies.
  • Adds runner aikitfile definitions plus CPU/GPU GitHub Actions workflows to build and validate runner images, along with unit tests.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
runners/llama-cpp-cpu.yaml New runner aikitfile for CPU llama-cpp runtime-only image.
runners/llama-cpp-cuda.yaml New runner aikitfile for CUDA llama-cpp runtime-only image.
runners/diffusers-cuda.yaml New runner aikitfile for CUDA diffusers runtime-only image.
runners/vllm-cuda.yaml New runner aikitfile for CUDA vLLM runtime-only image.
pkg/build/build_test.go Extends inference config validation tests to accept runner-mode configs (backends without models).
pkg/aikit2llb/inference/runner.go Implements runner-mode detection, dependency installation, entrypoint install, and script generation for runtime downloads/config.
pkg/aikit2llb/inference/runner_test.go Adds unit tests for runner-mode detection and generated script content for supported backends.
pkg/aikit2llb/inference/image.go Sets runner entrypoint and labels in image config when runner mode is active.
pkg/aikit2llb/inference/convert.go Adds runner-mode build path, and factors config creation into reusable writeConfig().
.github/workflows/test-docker-runner.yaml New CI workflow to build/test the CPU runner image and validate chat completions.
.github/workflows/test-docker-runner-gpu.yaml New manual CI workflow to build/test GPU runner images on self-hosted GPU runners.
Comments suppressed due to low confidence (1)

.github/workflows/test-docker-runner-gpu.yaml:92

  • Similarly, jq '.data[0].url' will output null on failures and the -z check won’t catch it. Use jq -e to assert .data[0].url is a non-null string (and/or validate .data|length>0) so the diffusers runner test fails when image generation fails.
          url=$(echo "$result" | jq '.data[0].url')
          if [ -z "$url" ]; then
            exit 1
          fi

// Some backends (diffusers/vllm) already install python, but llama-cpp does not,
// so we always install the minimal set here.
s = s.Run(
utils.Sh("apt-get update && apt-get install --no-install-recommends -y curl python3 python3-pip && pip install --break-system-packages huggingface-hub[cli] 2>/dev/null || pip install huggingface-hub[cli] && apt-get clean"),
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dependency install shell chain mixes && and || without grouping. As written, an apt-get failure can be masked because the || pip install ... branch may run and the overall command can still exit 0 after apt-get clean. Consider grouping the pip fallback only (e.g., wrap the two pip install attempts in parentheses) so apt-get failures still fail the build.

Suggested change
utils.Sh("apt-get update && apt-get install --no-install-recommends -y curl python3 python3-pip && pip install --break-system-packages huggingface-hub[cli] 2>/dev/null || pip install huggingface-hub[cli] && apt-get clean"),
utils.Sh("apt-get update && apt-get install --no-install-recommends -y curl python3 python3-pip && (pip install --break-system-packages huggingface-hub[cli] 2>/dev/null || pip install huggingface-hub[cli]) && apt-get clean"),

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +29
s = s.Run(
utils.Sh("apt-get update && apt-get install --no-install-recommends -y curl python3 python3-pip && pip install --break-system-packages huggingface-hub[cli] 2>/dev/null || pip install huggingface-hub[cli] && apt-get clean"),
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runner dependency install step unconditionally uses apt-get, but utils.AppleSiliconBase is Fedora-based (no apt). Runner mode with runtime: applesilicon will fail at build time. Consider branching on runtime (or detecting available package manager) and using dnf on Apple Silicon, or explicitly disallowing runner mode for that runtime in validation with a clear error.

Suggested change
s = s.Run(
utils.Sh("apt-get update && apt-get install --no-install-recommends -y curl python3 python3-pip && pip install --break-system-packages huggingface-hub[cli] 2>/dev/null || pip install huggingface-hub[cli] && apt-get clean"),
installerCmd := `if command -v apt-get >/dev/null 2>&1; then \
apt-get update && apt-get install --no-install-recommends -y curl python3 python3-pip && \
pip install --break-system-packages huggingface-hub[cli] 2>/dev/null || pip install huggingface-hub[cli] && \
apt-get clean; \
elif command -v dnf >/dev/null 2>&1; then \
dnf install -y curl python3 python3-pip && \
pip install --break-system-packages huggingface-hub[cli] 2>/dev/null || pip install huggingface-hub[cli] && \
dnf clean all; \
else \
echo "No supported package manager (apt-get or dnf) found; cannot install runner dependencies." >&2; \
exit 1; \
fi`
s = s.Run(
utils.Sh(installerCmd),

Copilot uses AI. Check for mistakes.
`)
}

sb.WriteString(`LOCAL_AI_ARGS+=("${EXTRA_ARGS[@]+"${EXTRA_ARGS[@]}"}")
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOCAL_AI_ARGS+=("${EXTRA_ARGS[@]+"${EXTRA_ARGS[@]}"}") does not append extra args correctly: the nested quotes become literal text, so multiple extra args can collapse into a single element containing quotes/spaces. With set -u, it’s safer to append conditionally based on array length (e.g., only append when ${#EXTRA_ARGS[@]} > 0) and then use LOCAL_AI_ARGS+=("${EXTRA_ARGS[@]}").

Suggested change
sb.WriteString(`LOCAL_AI_ARGS+=("${EXTRA_ARGS[@]+"${EXTRA_ARGS[@]}"}")
sb.WriteString(`if ((${#EXTRA_ARGS[@]})); then
LOCAL_AI_ARGS+=("${EXTRA_ARGS[@]}")
fi

Copilot uses AI. Check for mistakes.
Comment on lines +74 to +83
// writeConfig writes the /config.yaml file to the image when c.Config is set.
func writeConfig(c *config.InferenceConfig, base llb.State, s llb.State, platform specs.Platform) (llb.State, llb.State) {
savedState := s
if c.Config != "" {
s = s.Run(utils.Shf("mkdir -p /configuration && echo -n \"%s\" > /config.yaml", c.Config),
llb.WithCustomName(fmt.Sprintf("Creating config for platform %s/%s", platform.OS, platform.Architecture))).Root()
}
diff := llb.Diff(savedState, s)
merge := llb.Merge([]llb.State{base, diff})
return s, merge
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeConfig builds a shell command with echo -n "%s" using the raw YAML string. This will break for configs containing quotes/newlines and also allows shell injection during the image build. Prefer writing the file via llb.Mkfile (or a here-doc that avoids interpolation) so the config content is treated as data, not shell syntax.

Copilot uses AI. Check for mistakes.
Comment on lines +93 to +99
echo $result

choices=$(echo "$result" | jq '.choices')
if [ -z "$choices" ]; then
exit 1
fi

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow’s validation can false-pass: jq '.choices' returns the string null on error responses, which is non-empty so [ -z "$choices" ] won’t fail. Use jq -e to assert a non-null, non-empty array (and/or explicitly fail if .error is present) so CI reliably detects runner failures.

Suggested change
echo $result
choices=$(echo "$result" | jq '.choices')
if [ -z "$choices" ]; then
exit 1
fi
echo "$result"
echo "$result" | jq -e '
if (.error? != null) then
error("error field present in response")
elif (.choices | type != "array" or (.choices | length) == 0) then
error("choices must be a non-empty array")
else
.
end
' > /dev/null

Copilot uses AI. Check for mistakes.
Comment on lines +74 to +77
choices=$(echo "$result" | jq '.choices')
if [ -z "$choices" ]; then
exit 1
fi
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON assertion here can false-pass: jq '.choices' yields null (non-empty string) if the API returns an error object, so the [ -z "$choices" ] check won’t catch failures. Consider jq -e '.choices and (.choices|type=="array") and (.choices|length>0)' (and fail on .error) to make the GPU runner test meaningful.

This issue also appears on line 89 of the same file.

Copilot uses AI. Check for mistakes.
sozercan added 5 commits March 9, 2026 23:33
The previous version (v2.1.6) was built with Go 1.24, which is lower
than the project's Go 1.26 target, causing lint CI to fail with
"can't load config" error.
- Fix pip install fallback grouping: wrap the two pip attempts in
  parentheses so apt-get failures are not masked by the || branch
- Disallow runner mode on Apple Silicon in validation since the
  Fedora-based base image lacks apt-get
- Fix EXTRA_ARGS appending: use conditional length check instead of
  nested parameter expansion to correctly handle multiple args
- Use llb.Mkfile for writeConfig instead of shell echo to avoid
  shell injection and handle quotes/newlines safely
- Use jq -e with explicit error/choices validation in CI workflows
  to prevent false-pass on null/error API responses
Publishes runner images to ghcr.io/kaito-project/aikit/runners/ on
tag push (matching the main release workflow trigger). Builds all four
runner aikitfiles in a matrix, pushes with semver + latest tags, and
signs with cosign via GitHub OIDC.

Images published:
- ghcr.io/kaito-project/aikit/runners/llama-cpp-cpu (amd64 + arm64)
- ghcr.io/kaito-project/aikit/runners/llama-cpp-cuda (amd64)
- ghcr.io/kaito-project/aikit/runners/diffusers-cuda (amd64)
- ghcr.io/kaito-project/aikit/runners/vllm-cuda (amd64)
LocalAI v3.12.1 introduced an init() function in capabilities.go that
checks for /usr/local/cuda-12 directory existence before checking GPU
vendor presence. Since aikit installs cuda-cudart-12-5 (creating that
directory), LocalAI incorrectly selects the cuda12-llama-cpp backend
on CPU-only hosts, which crashes with missing libcuda.so.1.

Add a lightweight entrypoint wrapper script for CUDA images that
detects actual NVIDIA GPU presence via lspci at container startup.
When no GPU is found, it sets LOCALAI_FORCE_META_BACKEND_CAPABILITY=default
to force LocalAI to use the cpu-llama-cpp backend instead.

This is a workaround for an upstream LocalAI regression that broke
the fix from mudler/LocalAI#6149.
- Gate wrapper entrypoint on amd64 only (matching install gating) to
  avoid missing binary on CUDA arm64 images
- Respect existing LOCALAI_FORCE_META_BACKEND_CAPABILITY env var by
  exiting early when it is already set
- Improve GPU detection: check /dev/nvidiactl first, then nvidia-smi,
  then lspci as final fallback (avoids false negatives when lspci
  cannot see PCI devices inside containers)
- Add image_test.go with test matrix covering amd64/arm64, runner/
  standard mode, and wrapper script content assertions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[REQ] add runtime images without models

2 participants