Skip to content

Commit 93f8d15

Browse files
committed
containertool: Use same gzip headers on Linux and macOS
Motivation ---------- Packaging the same binary using the same version of `containertool` produces different application image layers on macOS and Linux: ``` linux% swift run containertool --verbose --repository registry.test:5000/hello hello-world --from scratch ... Uploading application layer application layer: sha256:54a282d5cd082320d2d4976e7d9a952da46e3bc4bab3ce1e0b3931ccf945b849 (80394382 bytes) image configuration: sha256:fdcb887ef6e27a09456419b03b1d8353b15d68d088b8ea023f38af892fca69be (462 bytes) ... macos% swift run containertool --verbose --repository registry.test:5000/hello hello-world --from scratch ... Uploading application layer application layer: sha256:08a21093e79423c17b58325decc48d7196481ed55276c2d168de23a75d38727e (80394382 bytes) image configuration: sha256:2648cd8cca1cad7ec5b386e8433e36ca77a40e31859e5994260b2ef1d07f0753 (462 bytes) ... ``` The `application layer` hashes are different, even though they contain the same binary. The `image configuration` metadata blob hashes also differ, but they contain timestamps so this will continue to happen even after this PR is merged. A future change could make these timestamps default to the epoch, allowing identical metadata blobs to be created on Linux and macOS as well. The image layer is a gzipped TAR archive containing the executable. Saving the intermediate steps shows that the TAR archives are identical and the gzipped streams are different, but only by one byte: ``` % diff <(hexdump -X linux-image.tar.gz) <(hexdump -X darwin-image.tar.gz) 1c1 < 0000000 1f 8b 08 00 00 00 00 00 00 03 ed 57 eb 6e 1c b7 --- > 0000000 1f 8b 08 00 00 00 00 00 00 13 ed 57 eb 6e 1c b7``` ``` The difference is in the 10th byte of the gzip header: the [OS field](https://datatracker.ietf.org/doc/html/rfc1952#page-5). RFC 1952 defines a list of [known operating systems](https://datatracker.ietf.org/doc/html/rfc1952#page-8): `0x03` is the OS code for Unix, however the RFC was written in 1996 so `Macintosh` refers to the classic MacOS. Zlib uses an updated operating system list madler/zlib@ce12c5c which defines `19` / `0x13` as the OS code for Darwin. Interestingly, using `gzip` to compress a file directly produces identical results on macOS and Linux (`-n` is needed to prevent `gzip` from including the current timestamp on macOS): ``` linux% cat hello-world | gzip -n | md5sum ef64adbee9e89e78114000442a804e0e - macos% cat hello-world | gzip -n | md5sum ef64adbee9e89e78114000442a804e0e - ``` Modifications ------------- By default, Zlib uses the value of `OS_CODE` [set at compile time](https://github.com/madler/zlib/blob/ef24c4c7502169f016dcd2a26923dbaf3216748c/deflate.c#L1054). This commit uses [deflateSetHeader()](https://github.com/madler/zlib/blob/ef24c4c7502169f016dcd2a26923dbaf3216748c/deflate.c#L705) to override the default gzip header, forcing the OS code to be 0x03 (Unix) on both Linux and macOS. Result ------ After this change, image layers containing the same binary will use identical gzip headers and should have the same hash whether they are built on Linux or macOS. It is still possible that different versions of Zlib might produce different compressed data, causing the overall hashes to change. Test Plan --------- Tested manually on macOS and Linux, verifying that image layers containing identical binaries have identical hashes.
1 parent 7c8ada9 commit 93f8d15

File tree

1 file changed

+26
-0
lines changed

1 file changed

+26
-0
lines changed

Sources/containertool/gzip.swift

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,36 @@ func gzip(_ bytes: [UInt8]) -> [UInt8] {
3535
stream.zfree = nil
3636
stream.opaque = nil
3737

38+
// Force identical gzip headers to be created on Linux and macOS.
39+
//
40+
// RFC1952 defines operating system codes which can be embedded in the gzip header.
41+
//
42+
// * Initially, zlib generated a default gzip header with the
43+
// OS field set to `Unknown` (255).
44+
// * https://github.com/madler/zlib/commit/0484693e1723bbab791c56f95597bd7dbe867d03
45+
// changed the default to `Unix` (3).
46+
// * https://github.com/madler/zlib/commit/ce12c5cd00628bf8f680c98123a369974d32df15
47+
// changed the default to use a value based on the OS detected
48+
// at compile time. After this, zlib on Linux continued to
49+
// use `Unix` (3) whereas macOS started to use `Apple` (19).
50+
//
51+
// According to RFC1952 Section 2.3.1.2. (Compliance), `Unknown`
52+
// 255 should be used by default where the OS on which the file
53+
// was created is not known.
54+
//
55+
// Different versions of zlib might still produce different
56+
// compressed output for the same input, but using the same default
57+
// value removes one one source of differences between platforms.
58+
59+
let gz_os_unknown = Int32(255)
60+
var header = gz_header()
61+
header.os = gz_os_unknown
62+
3863
let windowBits: Int32 = 15 + 16
3964
let level = Z_DEFAULT_COMPRESSION
4065
let memLevel: Int32 = 8
4166
let rc = CNIOExtrasZlib_deflateInit2(&stream, level, Z_DEFLATED, windowBits, memLevel, Z_DEFAULT_STRATEGY)
67+
deflateSetHeader(&stream, &header)
4268

4369
precondition(rc == Z_OK, "Unexpected return from zlib init: \(rc)")
4470

0 commit comments

Comments
 (0)