Skip to content

Commit 1ca260b

Browse files
authored
Merge pull request #107 from freedomofpress/2026-01-repro-build
Add article for reproducible builds
2 parents 74c4389 + ef14728 commit 1ca260b

File tree

1 file changed

+258
-0
lines changed

1 file changed

+258
-0
lines changed

src/news/2026-03-02-repro-build.md

Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
---
2+
title: Reproducing the reproducible images
3+
date: 2026-03-02
4+
---
5+
6+
The Dangerzone project has to be a bit distrustful: It distrusts the document that is processed in its container, and it also distrusts the registries that serve its image, which is why we sign it and ensure it’s bit-for-bit reproducible.
7+
8+
The reproducibility of our container image is one of our core defenses against supply chain attacks, and in this post, we talk a bit more broadly about this often-overlooked subject. We will also introduce [a collection of tools and CI helpers for reproducible images](https://github.com/freedomofpress/repro-build), which should be generic enough to apply to your project as well.
9+
10+
For those new to what “reproducible” means, here’s a helpful definition by the [Reproducible Builds](https://reproducible-builds.org/) project:
11+
12+
> “A build is **reproducible** if given the same source code, build environment and build instructions, any party can recreate bit-by-bit identical copies of all specified artifacts.”
13+
14+
When talking about reproducible **container** images, the two main container managers, Docker (BuildKit) and Podman (Buildah), suggest specifying both `SOURCE_DATE_EPOCH` and an argument to rewrite the image layers:
15+
16+
* For BuildKit, specify `SOURCE_DATE_EPOCH` in your Dockerfile (or environment) and pass the `rewrite-timestamp=true` option ([source](https://github.com/moby/buildkit/blob/master/docs/build-repro.md)).
17+
* For Buildah, specify `SOURCE_DATE_EPOCH` in your Dockerfile (or environment or use the CLI option `--source-date-epoch`) and pass the `--rewrite-timestamp` option ([source](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/10/html/building_running_and_managing_containers/introduction-to-reproducible-container-builds)).
18+
19+
## Let’s create a reproducible image
20+
21+
Reproducibility is actually much more than setting a timestamp. The image itself must be free of any sources of nondeterminism. Let’s consider a Dockerfile, where we install `gcc` in a Debian image that was created on 2023-09-04 and has remained the same ever since:
22+
23+
```Dockerfile
24+
FROM debian:bookworm-20230904-slim
25+
RUN apt-get update && apt-get install -y gcc
26+
```
27+
28+
Is this container image reproducible? Let’s find out:
29+
30+
```shell
31+
$ export SOURCE_DATE_EPOCH=1677619260 # Just a random UNIX epoch
32+
$ docker buildx --no-cache --output type=docker,dest=image.tar,rewrite-timestamp=true .
33+
[...]
34+
=> => rewriting layers with source-date-epoch 1677619260 (2023-02-28 21:21:00 +0000 UTC) 9.7s
35+
=> => exporting manifest sha256:81ebf01e608e298f2827d3da4e546e531e6b8b19f2f793df10b058f16b85545a
36+
```
37+
38+
Once more, with feeling:
39+
40+
```shell
41+
$ docker buildx --no-cache --output type=docker,dest=image.tar,rewrite-timestamp=true .
42+
[...]
43+
=> => rewriting layers with source-date-epoch 1677619260 (2023-02-28 21:21:00 +0000 UTC) 9.7s
44+
=> => exporting manifest sha256:231a1e36dae728a3f8bc3bdfa20a856c3bb78f0a3759ad98a1ea13d2d4920614 0.0s
45+
```
46+
47+
That’s interesting, the digests differ!
48+
49+
So, the Dockerfile is not reproducible, but why? The [Reproducible Containers](https://github.com/reproducible-containers) project by [Akihiro Suda](https://github.com/AkihiroSuda) fills an important gap in the understanding and tooling around reproducible container images. Among other utilities, it offers a tool to diff OCI images: `diffoci`.
50+
51+
So, let’s check why those images differ with [`diffoci`](https://github.com/reproducible-containers/diffoci). We’ll use the `--semantic` flag here, to check only file differences, and the `--report-dir diffs/` option to write the differing files into a `diffs/` directory:
52+
53+
```shell
54+
$ diffoci diff --semantic --report-dir diffs/ <image1> <image2>
55+
TYPE NAME INPUT-0 INPUT-1
56+
File var/log/apt/term.log f78e67afe7aca12045cd74a73d6bdb38fd2ef3621f64b178feb0b82fad9d26a2 e3b6ff496d4253810b97f432eec285bc0a4f85ae87d667cc7ef6f4c7c2eaef1f
57+
File var/cache/ldconfig/aux-cache c0bd5b8e12012d153bf527509e6ef0d125bafbd998e33443d8bca12c9352abb4 37f21a5282b0bc13af266cbab5c6819d7045c346f28c782a5920d52044f2720f
58+
File var/log/dpkg.log fe4e8edc30b2b0a8d936b9930ece042c9affa8b2b567f74d160a680d57874a73 530173cae986e3607bac1eef1b1739d1a3cfbfa539f9aebfaa7397b397f31712
59+
File var/log/alternatives.log abb1f05b7ff0598fcbe7374f513898a0acf6f23260661420ebb1e732bd09b192 da346849248fbcd01371339956f2a90a9e28ace7c54e596ed1dc95df06919111
60+
File var/log/apt/history.log 99e8510b67d705a84899115a25c15206fa9affd52f75af1e6ef6ecd06e679552 766b51931e938fee62973d04c495e38d9a36be7ca4b2efafb76883ac502672f2
61+
```
62+
63+
It seems that some APT-related files are different. Let’s check them out:
64+
65+
```shell
66+
$ diff diffs/input-{0,1}/layers-1/var/log/apt/term.log
67+
2c2
68+
< Log started: 2026-01-22 09:58:37
69+
---
70+
> Log started: 2026-01-22 09:59:38
71+
355c355
72+
< Log ended: 2026-01-22 09:58:47
73+
---
74+
> Log ended: 2026-01-22 09:59:48
75+
```
76+
77+
Makes sense, the fact that we define `SOURCE_DATE_EPOCH` does not alter the time within the container image and what will be written in the logs. A simple way to solve this problem is to remove the logs after `apt-get` is done.
78+
79+
But there’s more to that:
80+
81+
```shell
82+
$ docker run --rm <image> apt-cache policy gcc
83+
gcc:
84+
Installed: 4:12.2.0-3
85+
Candidate: 4:12.2.0-3
86+
Version table:
87+
*** 4:12.2.0-3 500
88+
500 http://deb.debian.org/debian bookworm/main amd64 Packages
89+
100 /var/lib/dpkg/status
90+
```
91+
92+
Turns out that the `gcc` package does not come from Debian’s archives (https://snapshot.debian.org/), but from the main bookworm one (see `http://deb.debian.org/debian bookworm/main)`. Granted, Bookworm is `oldstable` by now, and `gcc` does not change often, but that doesn’t mean that one of its dependencies can’t.
93+
94+
The proper solution here is another project by Reproducible Containers, [`repro-sources-list.sh`](https://github.com/reproducible-containers/repro-sources-list.sh/blob/master/repro-sources-list.sh). This script configures /etc/apt/sources.list and similar files for installing packages from a snapshot, using [https://snapshot.debian.org](https://snapshot.debian.org) in place of the default APT source.
95+
96+
Here’s a better Dockerfile using this script:
97+
98+
```Dockerfile
99+
FROM debian:bookworm-20230904-slim
100+
ENV DEBIAN_FRONTEND=noninteractive
101+
RUN \
102+
--mount=type=cache,target=/var/cache/apt,sharing=locked \
103+
--mount=type=cache,target=/var/lib/apt,sharing=locked \
104+
--mount=type=bind,source=./repro-sources-list.sh,target=/usr/local/bin/repro-sources-list.sh \
105+
repro-sources-list.sh && \
106+
apt-get update && \
107+
apt-get install -y gcc && \
108+
: "Clean up for improving reproducibility (optional)" && \
109+
rm -rf /var/log/* /var/cache/ldconfig/aux-cache
110+
```
111+
112+
This is now a bit-for-bit reproducible container image. As long as the Debian snapshot servers work, its digest is guaranteed to be `sha256:b0088ba0110c2acfe757eaf41967ac09fe16e96a8775b998577f86d90b3dbe53`.
113+
114+
Because we want to make sure that what we’re building now can be reproduced in a month or a year from now, we actually went ahead and created a nightly CI job that constantly builds this image and verifies its digest is the expected one, and it worked.
115+
116+
Well, until Feb. 20, 2025. But more on that below — we’re getting ahead of ourselves.
117+
118+
## On build environments
119+
120+
Reminder: A requirement for reproducible builds is to have the same **build environment** and **build instructions**. For container images, the build environment is the base container image, and the build instructions are the Dockerfile. But that’s not the whole story.
121+
122+
What if we build the above image with Podman?
123+
124+
```shell
125+
$ podman build --no-cache --source-date-epoch 1677619260 --rewrite-timestamp
126+
[...]
127+
4eb5ec336a90d4fb2ab7449782c3efdbfac8dcd11037b89213bf90ef2faec977
128+
```
129+
130+
**The digest is different**. Let’s check how many differences `diffoci` reports:
131+
132+
```shell
133+
$ diffoci diff <image1> <image2> | wc -l
134+
847
135+
```
136+
137+
Welp, that’s a lot of stuff there. The majority of those are filename reorders within the layer tarballs, some others are about `.wh.*` files, and some are about the image format itself (OCI vs. Docker). And yet, that does not mean that the container image is not reproducible. Quoting again from Reproducible Builds:
138+
139+
> Reproducible builds does not mandate that a given piece of source code is turned into the same bytes in all situations. This would be unfeasible. The output of a compiler is likely to be different from one version to another as better optimizations are integrated all the time.
140+
>
141+
> Instead, reproducible builds happen in the context of a build environment. It usually comprises the set of tools, required versions, and other assumptions about the operating system and its configuration. A description of this environment should typically be recorded and provided alongside any distributed binary package.
142+
143+
The point here is that the image builder, its version, and its arguments are part of the build environment and the build instructions. That’s why projects like Tor and Bitcoin use their own version of a [static toolchain](https://reproducible-builds.org/docs/virtual-machine-drivers/), possibly within a machine or container. And not only that, but OSes like Debian have [CI tests](https://tests.reproducible-builds.org/debian/reproducible.html) that verify packages remain reproducible and don’t have regressions.
144+
145+
So, what happened on Feb. 20, 2025? BuildKit `v0.20.0` was released, and our CI tests picked it up. This release had a small regression and added an extra field in the image config:
146+
147+
```diff
148+
"rootfs": {
149+
"type": "layers",
150+
"diff_ids": ["sha256:341de903..."]
151+
- }
152+
+ },
153+
+ "variant": "v8"
154+
```
155+
156+
This was enough to affect the digest of the image. But because we had these CI tests in place, we detected it immediately and opened [moby/buildkit#5774](https://github.com/moby/buildkit/issues/5774). The regression has been fixed, and the hash has remained unchanged ever since.
157+
158+
## Introducing repro-build, a collection of helpers for reproducible images
159+
160+
We strongly believe that reproducible containers that are built and verified only once are prone to rot. If we want more people to engage with them, they need a toolchain to work with and an easy way to continuously reproduce them as part of their CI tests.
161+
162+
With this in mind, we created [https://github.com/freedomofpress/repro-build](https://github.com/freedomofpress/repro-build), which holds:
163+
164+
* A script that reproducibly builds container images using a static build environment.
165+
* Two GitHub Actions:
166+
* One that reproducibly builds and pushes container images (and can be used in place of `docker/build-push-action`).
167+
* One that rebuilds a container image and compares the digests.
168+
169+
Let’s see that in more detail:
170+
171+
### The Python script
172+
173+
[`repro-build`](https://github.com/freedomofpress/repro-build?tab=readme-ov-file#build-a-container-image-locally):
174+
175+
```
176+
$ ./repro-build build --source-date-epoch 0 .
177+
2025-02-24 09:17:48 - INFO - Build environment:
178+
- Container runtime: docker
179+
- BuildKit image: moby/buildkit:v0.19.0@sha256:14aa1b4dd92ea0a4cd03a54d0c6079046ea98cd0c0ae6176bdd7036ba370cbbe
180+
- Rootless support: False
181+
- Caching enabled: True
182+
- Build context: ./repro-build
183+
- Dockerfile: (not provided)
184+
- Output: ./repro-build/image.tar
185+
186+
Build parameters:
187+
- SOURCE_DATE_EPOCH: 0
188+
- Build args: (not provided)
189+
- Tag: (not provided)
190+
- Platform: (default)
191+
192+
Podman-only arguments:
193+
- BuildKit arguments: (not provided)
194+
195+
Docker-only arguments:
196+
- Docker Buildx arguments: (not provided)
197+
198+
[...]
199+
```
200+
201+
This is a simple script that:
202+
203+
* Works with Docker and Podman, and ensures that BuildKit is used under the hood.
204+
* Pins BuildKit to a specific version.
205+
* Enforces the usage of a source date epoch or a human-friendly timestamp.
206+
* Removes some common sources of nondeterminism by rewriting timestamps and removing build provenance.
207+
208+
It’s our tested version of a static toolchain.
209+
210+
### A replacement for the [`docker/build-push-action`](https://github.com/docker/build-push-action) GitHub action that can reproducibly build container images
211+
212+
[`freedomofpress/repro-build@v1`](https://github.com/freedomofpress/repro-build?tab=readme-ov-file#reproducible-build-action-freedomofpressrepro-buildv1):
213+
214+
```yaml
215+
- name: Reproducibly build and push image
216+
uses: freedomofpress/repro-build@v1
217+
with:
218+
tags: ghcr.io/my-org/my-image:latest
219+
file: Dockerfile
220+
platforms: linux/amd64,linux/arm64
221+
source_date_epoch: 1677619260
222+
push: true
223+
```
224+
225+
For simple image builds, you can consider this a drop-in replacement for `docker/build-push-action`, doing a similar job as the above script. We know, because we make sure that both this action and the above script create, bit-for-bit, the same images.
226+
227+
### A GitHub action that rebuilds a container image and compares the digests
228+
229+
[`freedomofpress/repro-build/verify@v1`](https://github.com/freedomofpress/repro-build?tab=readme-ov-file#reproduce-and-verify-action-freedomofpressrepro-buildverifyv1):
230+
231+
```yaml
232+
- name: Verify image reproducibility
233+
uses: freedomofpress/repro-build/verify@v1
234+
with:
235+
target_image: ghcr.io/my-org/my-image:latest
236+
file: Dockerfile
237+
platforms: linux/amd64
238+
source_date_epoch: 1677619260
239+
runtime: podman
240+
```
241+
242+
### Reproducible container images
243+
244+
Our [repro-build](https://github.com/freedomofpress/repro-build) repo has a CI job that builds [Debian images](https://github.com/freedomofpress/repro-build/pkgs/container/repro-build%2Fdebian) from snapshot repos nightly and then rebuilds them immediately, to make sure that they are reproducible. This CI job reuses the helpers we mentioned above, and produces container images that you can independently reproduce and verify.
245+
246+
Also, it has a CI job that still reproduces `sha256:b0088ba0110c2acfe757eaf41967ac09fe16e96a8775b998577f86d90b3dbe53` every night, across container runtimes, BuildKit versions, and host images. You are more than welcome to copy our workflow and do the same for your images.
247+
248+
## Future work
249+
250+
All this is exciting, but there is still room for improvement:
251+
252+
1. Include the build environment and instructions within the container image.
253+
2. Implement a CI system where we can enroll reproducible images and continuously build and verify them, same as Debian has for their packages.
254+
3. Improve ergonomics for multi-arch images.
255+
256+
We want to make it easy for folks to create their first reproducible container image, which is why we are offering the tools we use ourselves. We believe that with increased adoption of these practices and a systematic way to verify that they work, the container ecosystem will become more robust against supply chain attacks, which is something we deeply care about. So try them out and give us your feedback!
257+
258+
For more, [view a 20-minute talk](https://fosdem.org/2026/schedule/event/RYM8SF-repro-build/) we gave on this subject at FOSDEM 2026.

0 commit comments

Comments
 (0)