|
| 1 | +--- |
| 2 | +title: Reproducing the reproducible images |
| 3 | +date: 2026-03-02 |
| 4 | +--- |
| 5 | + |
| 6 | +The Dangerzone project has to be a bit distrustful: It distrusts the document that is processed in its container, and it also distrusts the registries that serve its image, which is why we sign it and ensure it’s bit-for-bit reproducible. |
| 7 | + |
| 8 | +The reproducibility of our container image is one of our core defenses against supply chain attacks, and in this post, we talk a bit more broadly about this often-overlooked subject. We will also introduce [a collection of tools and CI helpers for reproducible images](https://github.com/freedomofpress/repro-build), which should be generic enough to apply to your project as well. |
| 9 | + |
| 10 | +For those new to what “reproducible” means, here’s a helpful definition by the [Reproducible Builds](https://reproducible-builds.org/) project: |
| 11 | + |
| 12 | +> “A build is **reproducible** if given the same source code, build environment and build instructions, any party can recreate bit-by-bit identical copies of all specified artifacts.” |
| 13 | +
|
| 14 | +When talking about reproducible **container** images, the two main container managers, Docker (BuildKit) and Podman (Buildah), suggest specifying both `SOURCE_DATE_EPOCH` and an argument to rewrite the image layers: |
| 15 | + |
| 16 | +* For BuildKit, specify `SOURCE_DATE_EPOCH` in your Dockerfile (or environment) and pass the `rewrite-timestamp=true` option ([source](https://github.com/moby/buildkit/blob/master/docs/build-repro.md)). |
| 17 | +* For Buildah, specify `SOURCE_DATE_EPOCH` in your Dockerfile (or environment or use the CLI option `--source-date-epoch`) and pass the `--rewrite-timestamp` option ([source](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/10/html/building_running_and_managing_containers/introduction-to-reproducible-container-builds)). |
| 18 | + |
| 19 | +## Let’s create a reproducible image |
| 20 | + |
| 21 | +Reproducibility is actually much more than setting a timestamp. The image itself must be free of any sources of nondeterminism. Let’s consider a Dockerfile, where we install `gcc` in a Debian image that was created on 2023-09-04 and has remained the same ever since: |
| 22 | + |
| 23 | +```Dockerfile |
| 24 | +FROM debian:bookworm-20230904-slim |
| 25 | +RUN apt-get update && apt-get install -y gcc |
| 26 | +``` |
| 27 | + |
| 28 | +Is this container image reproducible? Let’s find out: |
| 29 | + |
| 30 | +```shell |
| 31 | +$ export SOURCE_DATE_EPOCH=1677619260 # Just a random UNIX epoch |
| 32 | +$ docker buildx --no-cache --output type=docker,dest=image.tar,rewrite-timestamp=true . |
| 33 | +[...] |
| 34 | + => => rewriting layers with source-date-epoch 1677619260 (2023-02-28 21:21:00 +0000 UTC) 9.7s |
| 35 | + => => exporting manifest sha256:81ebf01e608e298f2827d3da4e546e531e6b8b19f2f793df10b058f16b85545a |
| 36 | +``` |
| 37 | + |
| 38 | +Once more, with feeling: |
| 39 | + |
| 40 | +```shell |
| 41 | +$ docker buildx --no-cache --output type=docker,dest=image.tar,rewrite-timestamp=true . |
| 42 | + [...] |
| 43 | + => => rewriting layers with source-date-epoch 1677619260 (2023-02-28 21:21:00 +0000 UTC) 9.7s |
| 44 | + => => exporting manifest sha256:231a1e36dae728a3f8bc3bdfa20a856c3bb78f0a3759ad98a1ea13d2d4920614 0.0s |
| 45 | +``` |
| 46 | + |
| 47 | +That’s interesting, the digests differ! |
| 48 | + |
| 49 | +So, the Dockerfile is not reproducible, but why? The [Reproducible Containers](https://github.com/reproducible-containers) project by [Akihiro Suda](https://github.com/AkihiroSuda) fills an important gap in the understanding and tooling around reproducible container images. Among other utilities, it offers a tool to diff OCI images: `diffoci`. |
| 50 | + |
| 51 | +So, let’s check why those images differ with [`diffoci`](https://github.com/reproducible-containers/diffoci). We’ll use the `--semantic` flag here, to check only file differences, and the `--report-dir diffs/` option to write the differing files into a `diffs/` directory: |
| 52 | + |
| 53 | +```shell |
| 54 | +$ diffoci diff --semantic --report-dir diffs/ <image1> <image2> |
| 55 | +TYPE NAME INPUT-0 INPUT-1 |
| 56 | +File var/log/apt/term.log f78e67afe7aca12045cd74a73d6bdb38fd2ef3621f64b178feb0b82fad9d26a2 e3b6ff496d4253810b97f432eec285bc0a4f85ae87d667cc7ef6f4c7c2eaef1f |
| 57 | +File var/cache/ldconfig/aux-cache c0bd5b8e12012d153bf527509e6ef0d125bafbd998e33443d8bca12c9352abb4 37f21a5282b0bc13af266cbab5c6819d7045c346f28c782a5920d52044f2720f |
| 58 | +File var/log/dpkg.log fe4e8edc30b2b0a8d936b9930ece042c9affa8b2b567f74d160a680d57874a73 530173cae986e3607bac1eef1b1739d1a3cfbfa539f9aebfaa7397b397f31712 |
| 59 | +File var/log/alternatives.log abb1f05b7ff0598fcbe7374f513898a0acf6f23260661420ebb1e732bd09b192 da346849248fbcd01371339956f2a90a9e28ace7c54e596ed1dc95df06919111 |
| 60 | +File var/log/apt/history.log 99e8510b67d705a84899115a25c15206fa9affd52f75af1e6ef6ecd06e679552 766b51931e938fee62973d04c495e38d9a36be7ca4b2efafb76883ac502672f2 |
| 61 | +``` |
| 62 | + |
| 63 | +It seems that some APT-related files are different. Let’s check them out: |
| 64 | + |
| 65 | +```shell |
| 66 | +$ diff diffs/input-{0,1}/layers-1/var/log/apt/term.log |
| 67 | +2c2 |
| 68 | +< Log started: 2026-01-22 09:58:37 |
| 69 | +--- |
| 70 | +> Log started: 2026-01-22 09:59:38 |
| 71 | +355c355 |
| 72 | +< Log ended: 2026-01-22 09:58:47 |
| 73 | +--- |
| 74 | +> Log ended: 2026-01-22 09:59:48 |
| 75 | +``` |
| 76 | + |
| 77 | +Makes sense, the fact that we define `SOURCE_DATE_EPOCH` does not alter the time within the container image and what will be written in the logs. A simple way to solve this problem is to remove the logs after `apt-get` is done. |
| 78 | + |
| 79 | +But there’s more to that: |
| 80 | + |
| 81 | +```shell |
| 82 | +$ docker run --rm <image> apt-cache policy gcc |
| 83 | +gcc: |
| 84 | + Installed: 4:12.2.0-3 |
| 85 | + Candidate: 4:12.2.0-3 |
| 86 | + Version table: |
| 87 | + *** 4:12.2.0-3 500 |
| 88 | + 500 http://deb.debian.org/debian bookworm/main amd64 Packages |
| 89 | + 100 /var/lib/dpkg/status |
| 90 | +``` |
| 91 | + |
| 92 | +Turns out that the `gcc` package does not come from Debian’s archives (https://snapshot.debian.org/), but from the main bookworm one (see `http://deb.debian.org/debian bookworm/main)`. Granted, Bookworm is `oldstable` by now, and `gcc` does not change often, but that doesn’t mean that one of its dependencies can’t. |
| 93 | + |
| 94 | +The proper solution here is another project by Reproducible Containers, [`repro-sources-list.sh`](https://github.com/reproducible-containers/repro-sources-list.sh/blob/master/repro-sources-list.sh). This script configures /etc/apt/sources.list and similar files for installing packages from a snapshot, using [https://snapshot.debian.org](https://snapshot.debian.org) in place of the default APT source. |
| 95 | + |
| 96 | +Here’s a better Dockerfile using this script: |
| 97 | + |
| 98 | +```Dockerfile |
| 99 | +FROM debian:bookworm-20230904-slim |
| 100 | +ENV DEBIAN_FRONTEND=noninteractive |
| 101 | +RUN \ |
| 102 | + --mount=type=cache,target=/var/cache/apt,sharing=locked \ |
| 103 | + --mount=type=cache,target=/var/lib/apt,sharing=locked \ |
| 104 | + --mount=type=bind,source=./repro-sources-list.sh,target=/usr/local/bin/repro-sources-list.sh \ |
| 105 | + repro-sources-list.sh && \ |
| 106 | + apt-get update && \ |
| 107 | + apt-get install -y gcc && \ |
| 108 | + : "Clean up for improving reproducibility (optional)" && \ |
| 109 | + rm -rf /var/log/* /var/cache/ldconfig/aux-cache |
| 110 | +``` |
| 111 | + |
| 112 | +This is now a bit-for-bit reproducible container image. As long as the Debian snapshot servers work, its digest is guaranteed to be `sha256:b0088ba0110c2acfe757eaf41967ac09fe16e96a8775b998577f86d90b3dbe53`. |
| 113 | + |
| 114 | +Because we want to make sure that what we’re building now can be reproduced in a month or a year from now, we actually went ahead and created a nightly CI job that constantly builds this image and verifies its digest is the expected one, and it worked. |
| 115 | + |
| 116 | +Well, until Feb. 20, 2025. But more on that below — we’re getting ahead of ourselves. |
| 117 | + |
| 118 | +## On build environments |
| 119 | + |
| 120 | +Reminder: A requirement for reproducible builds is to have the same **build environment** and **build instructions**. For container images, the build environment is the base container image, and the build instructions are the Dockerfile. But that’s not the whole story. |
| 121 | + |
| 122 | +What if we build the above image with Podman? |
| 123 | + |
| 124 | +```shell |
| 125 | +$ podman build --no-cache --source-date-epoch 1677619260 --rewrite-timestamp |
| 126 | +[...] |
| 127 | +4eb5ec336a90d4fb2ab7449782c3efdbfac8dcd11037b89213bf90ef2faec977 |
| 128 | +``` |
| 129 | + |
| 130 | +**The digest is different**. Let’s check how many differences `diffoci` reports: |
| 131 | + |
| 132 | +```shell |
| 133 | +$ diffoci diff <image1> <image2> | wc -l |
| 134 | +847 |
| 135 | +``` |
| 136 | + |
| 137 | +Welp, that’s a lot of stuff there. The majority of those are filename reorders within the layer tarballs, some others are about `.wh.*` files, and some are about the image format itself (OCI vs. Docker). And yet, that does not mean that the container image is not reproducible. Quoting again from Reproducible Builds: |
| 138 | + |
| 139 | +> Reproducible builds does not mandate that a given piece of source code is turned into the same bytes in all situations. This would be unfeasible. The output of a compiler is likely to be different from one version to another as better optimizations are integrated all the time. |
| 140 | +> |
| 141 | +> Instead, reproducible builds happen in the context of a build environment. It usually comprises the set of tools, required versions, and other assumptions about the operating system and its configuration. A description of this environment should typically be recorded and provided alongside any distributed binary package. |
| 142 | +
|
| 143 | +The point here is that the image builder, its version, and its arguments are part of the build environment and the build instructions. That’s why projects like Tor and Bitcoin use their own version of a [static toolchain](https://reproducible-builds.org/docs/virtual-machine-drivers/), possibly within a machine or container. And not only that, but OSes like Debian have [CI tests](https://tests.reproducible-builds.org/debian/reproducible.html) that verify packages remain reproducible and don’t have regressions. |
| 144 | + |
| 145 | +So, what happened on Feb. 20, 2025? BuildKit `v0.20.0` was released, and our CI tests picked it up. This release had a small regression and added an extra field in the image config: |
| 146 | + |
| 147 | +```diff |
| 148 | + "rootfs": { |
| 149 | + "type": "layers", |
| 150 | + "diff_ids": ["sha256:341de903..."] |
| 151 | +- } |
| 152 | ++ }, |
| 153 | ++ "variant": "v8" |
| 154 | +``` |
| 155 | + |
| 156 | +This was enough to affect the digest of the image. But because we had these CI tests in place, we detected it immediately and opened [moby/buildkit#5774](https://github.com/moby/buildkit/issues/5774). The regression has been fixed, and the hash has remained unchanged ever since. |
| 157 | + |
| 158 | +## Introducing repro-build, a collection of helpers for reproducible images |
| 159 | + |
| 160 | +We strongly believe that reproducible containers that are built and verified only once are prone to rot. If we want more people to engage with them, they need a toolchain to work with and an easy way to continuously reproduce them as part of their CI tests. |
| 161 | + |
| 162 | +With this in mind, we created [https://github.com/freedomofpress/repro-build](https://github.com/freedomofpress/repro-build), which holds: |
| 163 | + |
| 164 | +* A script that reproducibly builds container images using a static build environment. |
| 165 | +* Two GitHub Actions: |
| 166 | + * One that reproducibly builds and pushes container images (and can be used in place of `docker/build-push-action`). |
| 167 | + * One that rebuilds a container image and compares the digests. |
| 168 | + |
| 169 | +Let’s see that in more detail: |
| 170 | + |
| 171 | +### The Python script |
| 172 | + |
| 173 | +[`repro-build`](https://github.com/freedomofpress/repro-build?tab=readme-ov-file#build-a-container-image-locally): |
| 174 | + |
| 175 | +``` |
| 176 | +$ ./repro-build build --source-date-epoch 0 . |
| 177 | +2025-02-24 09:17:48 - INFO - Build environment: |
| 178 | +- Container runtime: docker |
| 179 | +- BuildKit image: moby/buildkit:v0.19.0@sha256:14aa1b4dd92ea0a4cd03a54d0c6079046ea98cd0c0ae6176bdd7036ba370cbbe |
| 180 | +- Rootless support: False |
| 181 | +- Caching enabled: True |
| 182 | +- Build context: ./repro-build |
| 183 | +- Dockerfile: (not provided) |
| 184 | +- Output: ./repro-build/image.tar |
| 185 | +
|
| 186 | +Build parameters: |
| 187 | +- SOURCE_DATE_EPOCH: 0 |
| 188 | +- Build args: (not provided) |
| 189 | +- Tag: (not provided) |
| 190 | +- Platform: (default) |
| 191 | +
|
| 192 | +Podman-only arguments: |
| 193 | +- BuildKit arguments: (not provided) |
| 194 | +
|
| 195 | +Docker-only arguments: |
| 196 | +- Docker Buildx arguments: (not provided) |
| 197 | +
|
| 198 | +[...] |
| 199 | +``` |
| 200 | + |
| 201 | +This is a simple script that: |
| 202 | + |
| 203 | +* Works with Docker and Podman, and ensures that BuildKit is used under the hood. |
| 204 | +* Pins BuildKit to a specific version. |
| 205 | +* Enforces the usage of a source date epoch or a human-friendly timestamp. |
| 206 | +* Removes some common sources of nondeterminism by rewriting timestamps and removing build provenance. |
| 207 | + |
| 208 | +It’s our tested version of a static toolchain. |
| 209 | + |
| 210 | +### A replacement for the [`docker/build-push-action`](https://github.com/docker/build-push-action) GitHub action that can reproducibly build container images |
| 211 | + |
| 212 | +[`freedomofpress/repro-build@v1`](https://github.com/freedomofpress/repro-build?tab=readme-ov-file#reproducible-build-action-freedomofpressrepro-buildv1): |
| 213 | + |
| 214 | +```yaml |
| 215 | +- name: Reproducibly build and push image |
| 216 | + uses: freedomofpress/repro-build@v1 |
| 217 | + with: |
| 218 | + tags: ghcr.io/my-org/my-image:latest |
| 219 | + file: Dockerfile |
| 220 | + platforms: linux/amd64,linux/arm64 |
| 221 | + source_date_epoch: 1677619260 |
| 222 | + push: true |
| 223 | +``` |
| 224 | +
|
| 225 | +For simple image builds, you can consider this a drop-in replacement for `docker/build-push-action`, doing a similar job as the above script. We know, because we make sure that both this action and the above script create, bit-for-bit, the same images. |
| 226 | + |
| 227 | +### A GitHub action that rebuilds a container image and compares the digests |
| 228 | + |
| 229 | +[`freedomofpress/repro-build/verify@v1`](https://github.com/freedomofpress/repro-build?tab=readme-ov-file#reproduce-and-verify-action-freedomofpressrepro-buildverifyv1): |
| 230 | + |
| 231 | +```yaml |
| 232 | +- name: Verify image reproducibility |
| 233 | + uses: freedomofpress/repro-build/verify@v1 |
| 234 | + with: |
| 235 | + target_image: ghcr.io/my-org/my-image:latest |
| 236 | + file: Dockerfile |
| 237 | + platforms: linux/amd64 |
| 238 | + source_date_epoch: 1677619260 |
| 239 | + runtime: podman |
| 240 | +``` |
| 241 | + |
| 242 | +### Reproducible container images |
| 243 | + |
| 244 | +Our [repro-build](https://github.com/freedomofpress/repro-build) repo has a CI job that builds [Debian images](https://github.com/freedomofpress/repro-build/pkgs/container/repro-build%2Fdebian) from snapshot repos nightly and then rebuilds them immediately, to make sure that they are reproducible. This CI job reuses the helpers we mentioned above, and produces container images that you can independently reproduce and verify. |
| 245 | + |
| 246 | +Also, it has a CI job that still reproduces `sha256:b0088ba0110c2acfe757eaf41967ac09fe16e96a8775b998577f86d90b3dbe53` every night, across container runtimes, BuildKit versions, and host images. You are more than welcome to copy our workflow and do the same for your images. |
| 247 | + |
| 248 | +## Future work |
| 249 | + |
| 250 | +All this is exciting, but there is still room for improvement: |
| 251 | + |
| 252 | +1. Include the build environment and instructions within the container image. |
| 253 | +2. Implement a CI system where we can enroll reproducible images and continuously build and verify them, same as Debian has for their packages. |
| 254 | +3. Improve ergonomics for multi-arch images. |
| 255 | + |
| 256 | +We want to make it easy for folks to create their first reproducible container image, which is why we are offering the tools we use ourselves. We believe that with increased adoption of these practices and a systematic way to verify that they work, the container ecosystem will become more robust against supply chain attacks, which is something we deeply care about. So try them out and give us your feedback! |
| 257 | + |
| 258 | +For more, [view a 20-minute talk](https://fosdem.org/2026/schedule/event/RYM8SF-repro-build/) we gave on this subject at FOSDEM 2026. |
0 commit comments