build-rootfs: add com.coreos.inputhash label

jlebon · jlebon · commit 52f2a02c9062 · 2025-09-19T16:35:59.000-04:00
Right now, there is no change detection in our builds. We just always do a new build, which is wasteful. Let's re-implement a concept similar to rpm-ostree's inputhash, which just hashes all the relevant inputs. We then add a label to the final image with that input. On the cosa side, we can compare the hash to the previous build to know if to no-op. One major difference from rpm-ostree's inputhash is that it could know the hash right after doing the depsolve (to have the final list of RPMs) and thus avoid a full build. It's possible to do this here too though it'd require bootc-base-imagectl and rpm-ostree changes. We'd need to also define an API for having a `podman build` actually signal "no-op"; e.g. writing a file in the build context and erroring the build... awkward. Much more interesting and related to this is reproducible builds; if our builds were reproducible, we wouldn't have to worry about this because we'd just build the exact same artifact everytime. There's some work required to get there though, and likely we'd have to rework how we calculate our versions, since that's a dynamic value which affects the rootfs and OCI labels (e.g. always have the same version for the set of inputs; see also discussions in coreos/fedora-coreos-tracker#2015).
diff --git a/Containerfile b/Containerfile
@@ -45,7 +45,8 @@ RUN --mount=type=cache,rw,id=coreos-build-cache,target=/cache \
 RUN --mount=type=bind,target=/run/src,rw \
       rpm-ostree experimental compose build-chunked-oci \
         --bootc --format-version=1 --rootfs /target-rootfs \
-        --output oci-archive:/run/src/out.ociarchive
+        --output oci-archive:/run/src/out.ociarchive \
+        --label com.coreos.inputhash=$(cat /run/inputhash)
 
 FROM oci-archive:./out.ociarchive
 ARG VERSION
diff --git a/build-rootfs b/build-rootfs
@@ -8,6 +8,7 @@
 # 5. It runs the postprocess scripts defined in the manifest.
 
 import glob
+import hashlib
 import json
 import os
 import shutil
@@ -19,6 +20,7 @@ import yaml
 
 ARCH = os.uname().machine
 SRCDIR = '/src'
+INPUTHASH = '/run/inputhash'
 
 
 def main():
@@ -66,6 +68,7 @@ def main():
     inject_content_manifest(target_rootfs, manifest)
 
     if version != "":
+        overlays.remove(dracut_tmpd.name)
         cleanup_dracut_version(target_rootfs, dracut_tmpd)
         inject_version_info(target_rootfs, manifest['mutate-os-release'], version)
 
@@ -75,6 +78,8 @@ def main():
     run_postprocess_scripts(target_rootfs, manifest)
     cleanup_extraneous_files(target_rootfs)
 
+    calculate_inputhash(target_rootfs, overlays, manifest)
+
 
 def get_treefile(manifest_path):
     with tempfile.NamedTemporaryFile(suffix='.json', mode='w') as tmp_manifest:
@@ -434,6 +439,35 @@ def cleanup_extraneous_files(rootfs):
     unlink_optional('usr/share/rpm/.rpm.lock')
 
 
+def calculate_inputhash(rootfs, overlays, manifest):
+    h = hashlib.sha256()
+
+    # rpms
+    rpms = bwrap(rootfs, ['rpm', '-qa', '--qf', '%{NEVRA}\n'], capture=True)
+    rpms = sorted(rpms.splitlines())
+    h.update(''.join(rpms).encode('utf-8'))
+
+    # overlays
+    for overlay in overlays:
+        all_files = []
+        for root, _, files in os.walk(overlay):
+            for file in files:
+                all_files.append(os.path.join(root, file))
+        all_files = sorted(all_files)
+        for file in all_files:
+            with open(file, 'rb') as f:
+                h.update(hashlib.file_digest(f, 'sha256').digest())
+                has_x_bit = os.stat(f.fileno()).st_mode & 0o111 != 0
+                h.update(bytes([has_x_bit]))
+
+    # postprocess
+    for script in manifest.get('postprocess', []):
+        h.update(script.encode('utf-8'))
+
+    with open(INPUTHASH, 'w', encoding='utf-8') as f:
+        f.write(h.hexdigest())
+
+
 # Imported from cosa
 # Merge two lists, avoiding duplicates. Exact duplicate kargs could be valid
 # but we have no use case for them right now in our official images.