Skip to content

Refactor image build, create multi-arch images, drop Builder usage#347

Merged
sairon merged 24 commits intomasterfrom
use-no-builder
Mar 17, 2026
Merged

Refactor image build, create multi-arch images, drop Builder usage#347
sairon merged 24 commits intomasterfrom
use-no-builder

Conversation

@sairon
Copy link
Member

@sairon sairon commented Mar 3, 2026

This PR drops the Builder container in favor of native BuildKit builds using the reusable actions and workflow introduced in home-assistant/builder#273.

Key changes in this PR:

  • Replaced the old builder.yml workflow with calls to the reusable workflow and composite actions from home-assistant/builder
  • Dockerfiles are simplified - labels are hardcoded where appropriate (io.hass.type, io.hass.base.name) rather than injected externally by the builder
  • Multi-arch images are built in parallel and combined into manifest lists
  • The builder workflow now also runs on push to master, leveraging GHA cache and keeping it warm for release builds

See home-assistant/builder#273 for details on the reusable workflow itself (caching strategy, cosign verification, zstd compression, runner selection, etc.).

sairon added 2 commits March 3, 2026 18:52
This PR fundamentally changes how our images are built. The usage of the
Builder container is dropped in favor of "native" build using BuildKit with
docker/build-push-action.

Dockerfiles are now the single source of truth for all labels and build
arguments - the build metadata (version, date, architecture, repository) is
passed via --build-arg and consumed directly in the Dockerfile's LABEL
instruction, removing the need for external label injection.

Build caching uses GitHub Actions cache as the primary backend, with inline
cache metadata embedded in pushed images as a fallback for cache reuse across
git refs (since GHA cache is scoped per branch/tag). Registry images are
verified with cosign before being used as cache sources.

Images are compressed with zstd (level 9) instead of gzip, reducing image size
and improving pull times on registries and runtimes that support it.

Multi-arch support is handled by building per-architecture images in parallel
on native runners (amd64 on ubuntu-24.04, aarch64 on ubuntu-24.04-arm), then
combining them into a single manifest list using docker buildx imagetools.

The reusable builder workflow (.github/workflows/reuseable-builder.yml) and the
build-image composite action (.github/actions/build-image/) are designed to be
generic enough to be extracted to the original home-assistant/builder repo,
replacing the current docker-in-docker approach with a simpler, more cacheable
workflow.

Thanks to the caching, the builder workflow now also runs on push to the master
branch, keeping the GHA cache warm for release builds without adding
significant CI cost.
@sairon
Copy link
Member Author

sairon commented Mar 3, 2026

The build failures for Python are expected - it's a chicken-egg problem. Without having ghcr.io/home-assistant/base in the registry, we can't use as the base image. We had similar problems for PR builds in the past when bumping Alpine versions while updating the base image matrix for Python as well.

The builds were tested in my fork, so I'd say the CI can be ignored here - after merge, the base image should be published before the Python builds and everything should pass.

@sairon sairon requested review from agners, edenhaus and frenck March 3, 2026 18:03
sairon added a commit to home-assistant/builder that referenced this pull request Mar 4, 2026
This PR fundamentally changes how our images are built. The usage of the
Builder container is dropped in favor of "native" build using BuildKit with
docker/build-push-action.

Dockerfiles are now the single source of truth for all labels and build
arguments - the build metadata (version, date, architecture, repository) is
passed via --build-arg and consumed directly in the Dockerfile's LABEL
instruction, removing the need for external label injection.

Build caching uses GitHub Actions cache as the primary backend, with inline
cache metadata embedded in pushed images as a fallback for cache reuse across
git refs (since GHA cache is scoped per branch/tag). Registry images are
verified with cosign before being used as cache sources.

Images are compressed with zstd (level 9) instead of gzip, reducing image size
and improving pull times on registries and runtimes that support it.

Multi-arch support is handled by building per-architecture images in parallel
on native runners (amd64 on ubuntu-24.04, aarch64 on ubuntu-24.04-arm), then
combining them into a single manifest list using docker buildx imagetools.

Thanks to the caching, the builder workflow now also runs on push to the master
branch, keeping the GHA cache warm for release builds without adding
significant CI cost.

A reference implementation is in home-assistant/docker-base#347.
Copy link
Member

@agners agners left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite good to me.

I wonder if it will feel easier to follow what exactly is happening. The old build.yaml was kinda nice summary of all parameters. We do have some in the builder workflow, and some in the Dockerfile now. But tradeoffs... We'll see.

@sairon sairon requested a review from agners March 4, 2026 11:32
@sairon
Copy link
Member Author

sairon commented Mar 4, 2026

I wonder if it will feel easier to follow what exactly is happening. The old build.yaml was kinda nice summary of all parameters. We do have some in the builder workflow, and some in the Dockerfile now. But tradeoffs... We'll see.

The builder workflow now essentially supplies only the build date, version and source repository, which were (or should have been) dynamically generated anyway. For example for the Python images, builder injects these args:

BUILD_VERSION=2026.03.31
BUILD_ARCH=amd64
BUILD_DATE=2026-03-03 17:13:27+00:00
BUILD_REPOSITORY=https://github.com/sairon/ha-docker-base
BASE_IMAGE=ghcr.io/sairon/base
BASE_VERSION=3.21

The BASE_* args are special here because of the matrix, most images will have a single static BUILD_FROM.

Where we make a little trade-off is the dependencies versions, which were nicely on a single place before, but nothing what git blame/git log couldn't help with.

@sairon
Copy link
Member Author

sairon commented Mar 4, 2026

FTR, home-assistant/builder#273 needs to be merged first and references to the gha-builder branch updated here.

sairon added 3 commits March 5, 2026 12:02
Because the Cosign subject is derived from the running workflow, we need to run
the action using Cosign in a local workflow instead of calling reusable
workflow from another repo.
sairon added a commit to home-assistant/builder that referenced this pull request Mar 16, 2026
This PR provides a set of reusable composite actions that replace the Builder
container with "native" BuildKit builds using docker/build-push-action.

`actions/build-image` builds and optionally pushes and signs a
single-architecture image. Build metadata (`BUILD_ARCH`, `BUILD_VERSION`) is
passed to the Dockerfile via `--build-arg`, while OCI and Home Assistant labels
(`io.hass.arch`, `io.hass.version`, `org.opencontainers.image.*`) are applied
directly by the action through docker/build-push-action's label support.
Additional build args and labels can be passed through the `build-args` and
`labels` inputs. Images are compressed with zstd (level 9) instead of gzip,
reducing image size and improving pull times on registries and runtimes that
support it. Build caching uses GitHub Actions cache as the primary backend,
with inline cache metadata embedded in pushed images as a fallback for cache
reuse across git refs (since GHA cache is scoped per branch/tag). Pushed images
are signed with Cosign, with retry and exponential backoff. Base and cache
images can optionally be verified before the build starts.

`actions/cosign-verify` verifies the Cosign signature of a container image
against a certificate identity and OIDC issuer, with retry logic and an
optional allow-failure mode.

`actions/prepare-multi-arch-matrix` validates the requested architectures
(amd64, aarch64) and outputs a JSON matrix mapping each to a native runner
(ubuntu-24.04, ubuntu-24.04-arm) and a registry image name, ready to be
consumed by a build matrix job.

`actions/publish-multi-arch-manifest` combines per-architecture images into a
single manifest list using `docker buildx imagetools create`, applies all
requested tags, and signs the resulting manifest with Cosign.

Together, these actions support a workflow where per-architecture images are
built in parallel on native runners, then combined into a multi-arch manifest.
Thanks to the caching, the build can also run on push to the master branch to
keep the GHA cache warm for release builds without adding significant CI cost.

A reference implementation is in
home-assistant/docker-base#347.
@sairon sairon requested a review from agners March 17, 2026 16:55
@sairon
Copy link
Member Author

sairon commented Mar 17, 2026

Should be ready for final review now. It builds fine, caches are correctly used, images look correct and "same" from metadata perspective (verified with a vibe-coded helper script shared below).

image
#!/usr/bin/env bash
# Compare container image manifests between official HA and fork builds.
# Usage: ./compare-images.sh [official_version] [fork_version]
#   e.g. ./compare-images.sh 2026.02.0 2026.03.11

set -euo pipefail

OFFICIAL_VERSION="${1:-2026.02.0}"
FORK_VERSION="${2:-2026.03.43}"

OFFICIAL_REPO="ghcr.io/home-assistant"
FORK_REPO="ghcr.io/sairon"

ALPINE_VERSIONS=("3.23")
PYTHON_VERSIONS=("3.13")
ARCHES=("amd64" "aarch64")

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[0;33m'
BOLD='\033[1m'
NC='\033[0m'

pass() { echo -e "  ${GREEN}PASS${NC} $1"; }
fail() { echo -e "  ${RED}FAIL${NC} $1"; ERRORS=$((ERRORS + 1)); }
warn() { echo -e "  ${YELLOW}WARN${NC} $1"; }
ERRORS=0

# Expected labels (keys only — values differ per build)
EXPECTED_LABELS=(
    "io.hass.type"
    "io.hass.base.name"
    "io.hass.base.version"
    "io.hass.base.arch"
    "io.hass.base.image"
    "io.hass.arch"
    "io.hass.version"
    "org.opencontainers.image.created"
    "org.opencontainers.image.version"
    "org.opencontainers.image.source"
)

inspect_image() {
    local ref="$1"
    local arch="${2:-}"
    if [[ -n "$arch" ]]; then
        skopeo inspect --override-arch "$arch" "docker://${ref}" 2>/dev/null
    else
        skopeo inspect "docker://${ref}" 2>/dev/null
    fi
}

inspect_raw() {
    local ref="$1"
    skopeo inspect --raw "docker://${ref}" 2>/dev/null
}

compare_per_arch() {
    local image_name="$1"
    local image_tag="$2"
    local arch="$3"
    local official_ref="${OFFICIAL_REPO}/${arch}-${image_name}:${image_tag}-${OFFICIAL_VERSION}"
    local fork_ref="${FORK_REPO}/${arch}-${image_name}:${image_tag}-${FORK_VERSION}"

    # Map HA arch names to OCI arch names for skopeo --override-arch
    local oci_arch
    case "$arch" in
        amd64)   oci_arch="amd64" ;;
        aarch64) oci_arch="arm64" ;;
        *)       oci_arch="$arch" ;;
    esac

    echo -e "\n${BOLD}=== ${arch}-${image_name}:${image_tag} ===${NC}"
    echo "  Official: ${official_ref}"
    echo "  Fork:     ${fork_ref}"

    # Fetch both manifests (use --override-arch for OCI index images)
    local official_json fork_json
    official_json=$(inspect_image "${official_ref}" "${oci_arch}") || { fail "Cannot inspect official image"; return; }
    fork_json=$(inspect_image "${fork_ref}" "${oci_arch}") || { fail "Cannot inspect fork image"; return; }

    # --- Labels ---
    echo -e "\n  ${BOLD}Labels:${NC}"
    local official_labels fork_labels
    official_labels=$(echo "$official_json" | jq -r '.Labels // {}')
    fork_labels=$(echo "$fork_json" | jq -r '.Labels // {}')

    for label in "${EXPECTED_LABELS[@]}"; do
        local off_val fork_val
        off_val=$(echo "$official_labels" | jq -r --arg k "$label" '.[$k] // empty')
        fork_val=$(echo "$fork_labels" | jq -r --arg k "$label" '.[$k] // empty')

        if [[ -z "$fork_val" ]]; then
            fail "${label}: missing in fork"
        elif [[ -z "$off_val" ]]; then
            warn "${label}: missing in official, fork has '${fork_val}'"
        else
            pass "${label}: official='${off_val}' fork='${fork_val}'"
        fi
    done

    # Check for extra labels in fork not in official
    local extra_labels
    extra_labels=$(jq -r --argjson off "$official_labels" 'to_entries[] | select(.key as $k | $off | has($k) | not) | .key' <<< "$fork_labels")
    if [[ -n "$extra_labels" ]]; then
        while IFS= read -r label; do
            warn "Extra label in fork: ${label}=$(echo "$fork_labels" | jq -r --arg k "$label" '.[$k]')"
        done <<< "$extra_labels"
    fi

    # --- Architecture & OS ---
    echo -e "\n  ${BOLD}Platform:${NC}"
    local off_arch fork_arch off_os fork_os
    off_arch=$(echo "$official_json" | jq -r '.Architecture')
    fork_arch=$(echo "$fork_json" | jq -r '.Architecture')
    off_os=$(echo "$official_json" | jq -r '.Os')
    fork_os=$(echo "$fork_json" | jq -r '.Os')

    [[ "$off_arch" == "$fork_arch" ]] && pass "Architecture: ${fork_arch}" || fail "Architecture: official='${off_arch}' fork='${fork_arch}'"
    [[ "$off_os" == "$fork_os" ]] && pass "OS: ${fork_os}" || fail "OS: official='${off_os}' fork='${fork_os}'"

    # --- Layer count & media types ---
    echo -e "\n  ${BOLD}Layers:${NC}"
    local off_count fork_count
    off_count=$(echo "$official_json" | jq '.Layers | length')
    fork_count=$(echo "$fork_json" | jq '.Layers | length')
    [[ "$off_count" == "$fork_count" ]] && pass "Layer count: ${fork_count}" || fail "Layer count: official=${off_count} fork=${fork_count}"

    local off_mime fork_mime
    off_mime=$(echo "$official_json" | jq -r '.LayersData[0].MIMEType // "unknown"')
    fork_mime=$(echo "$fork_json" | jq -r '.LayersData[0].MIMEType // "unknown"')
    pass "Layer MIME: official='${off_mime}' fork='${fork_mime}'"

    # --- Cosign signature ---
    echo -e "\n  ${BOLD}Cosign signature:${NC}"
    if cosign verify \
        --certificate-identity-regexp "https://github.com/sairon/ha-docker-base/.*" \
        --certificate-oidc-issuer-regexp "https://token.actions.githubusercontent.com" \
        "${fork_ref}" >/dev/null 2>&1; then
        pass "Signature valid for ${fork_ref}"
    else
        fail "Signature verification failed for ${fork_ref}"
    fi
}

compare_multiarch() {
    local image_name="$1"
    local image_tag="$2"
    local fork_ref="${FORK_REPO}/${image_name}:${image_tag}-${FORK_VERSION}"

    echo -e "\n${BOLD}=== Multi-arch ${image_name}:${image_tag} ===${NC}"
    echo "  Fork: ${fork_ref}"

    local raw_manifest
    raw_manifest=$(inspect_raw "${fork_ref}") || { fail "Cannot inspect multi-arch manifest"; return; }

    local media_type
    media_type=$(echo "$raw_manifest" | jq -r '.mediaType // .schemaVersion')

    echo -e "\n  ${BOLD}Manifest:${NC}"
    if echo "$raw_manifest" | jq -e '.manifests' >/dev/null 2>&1; then
        pass "Is a manifest list/index"
        local manifest_count
        manifest_count=$(echo "$raw_manifest" | jq '.manifests | length')
        pass "Contains ${manifest_count} platform(s)"

        echo "$raw_manifest" | jq -r '.manifests[] | "    \(.platform.os)/\(.platform.architecture) → \(.digest[:24])..."'
    else
        fail "Not a manifest list — single-platform image"
    fi

    # --- Cosign signature on multi-arch tag ---
    echo -e "\n  ${BOLD}Cosign signature:${NC}"
    if cosign verify \
        --certificate-identity-regexp "https://github.com/sairon/ha-docker-base/.*" \
        --certificate-oidc-issuer-regexp "https://token.actions.githubusercontent.com" \
        "${fork_ref}" >/dev/null 2>&1; then
        pass "Signature valid for ${fork_ref}"
    else
        fail "Signature verification failed for ${fork_ref}"
    fi

    # Also verify the unversioned tag
    local fork_unversioned="${FORK_REPO}/${image_name}:${image_tag}"
    echo -e "\n  ${BOLD}Cosign signature (unversioned tag):${NC}"
    if cosign verify \
        --certificate-identity-regexp "https://github.com/sairon/ha-docker-base/.*" \
        --certificate-oidc-issuer-regexp "https://token.actions.githubusercontent.com" \
        "${fork_unversioned}" >/dev/null 2>&1; then
        pass "Signature valid for ${fork_unversioned}"
    else
        fail "Signature verification failed for ${fork_unversioned}"
    fi
}

echo -e "${BOLD}Comparing official (${OFFICIAL_VERSION}) vs fork (${FORK_VERSION})${NC}"

for alpine_ver in "${ALPINE_VERSIONS[@]}"; do
    for arch in "${ARCHES[@]}"; do
        compare_per_arch "base" "$alpine_ver" "$arch"
    done
    compare_multiarch "base" "$alpine_ver"
done

for alpine_ver in "${ALPINE_VERSIONS[@]}"; do
    for python_ver in "${PYTHON_VERSIONS[@]}"; do
        for arch in "${ARCHES[@]}"; do
            compare_per_arch "base-python" "${python_ver}-alpine${alpine_ver}" "$arch"
        done
        compare_multiarch "base-python" "${python_ver}-alpine${alpine_ver}"
    done
done

echo -e "\n${BOLD}=== Summary ===${NC}"
if [[ $ERRORS -eq 0 ]]; then
    echo -e "${GREEN}All checks passed.${NC}"
else
    echo -e "${RED}${ERRORS} check(s) failed.${NC}"
    exit 1
fi

Copy link
Member

@agners agners left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, LGTM.


Images are built for all platforms officially supported by Home Assistant, which are `amd64` and `arm64`.

Beginning with the 2026.03.1 release, all images are published as multi-arch images for these platforms. The old architecture-prefixed images (`aarch64-*`, `amd64-*`) are still available but preferably the multi-arch images should be used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reads a bit confusing as its about multi arch, then the prefixed, then multi arch again. Maybe:

Suggested change
Beginning with the 2026.03.1 release, all images are published as multi-arch images for these platforms. The old architecture-prefixed images (`aarch64-*`, `amd64-*`) are still available but preferably the multi-arch images should be used.
Beginning with the 2026.03.1 release, all images are published as multi-arch images under a single image name for all supported platforms. This image names should be used whenever possible. The old architecture-prefixed images (`aarch64-*`, `amd64-*`) are still available for compatibility.

@sairon sairon merged commit c31b6d6 into master Mar 17, 2026
48 of 66 checks passed
@sairon sairon deleted the use-no-builder branch March 17, 2026 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants