Skip to content

Commit 403e0fe

Browse files
authored
containertool: Use same gzip headers on Linux and macOS (#37)
Motivation ---------- Packaging the same binary using the same version of `containertool` produces different application image layers on macOS and Linux: ``` linux% swift run containertool --verbose --repository registry.test:5000/hello hello-world --from scratch ... Uploading application layer application layer: sha256:54a282d5cd082320d2d4976e7d9a952da46e3bc4bab3ce1e0b3931ccf945b849 (80394382 bytes) image configuration: sha256:fdcb887ef6e27a09456419b03b1d8353b15d68d088b8ea023f38af892fca69be (462 bytes) ... macos% swift run containertool --verbose --repository registry.test:5000/hello hello-world --from scratch ... Uploading application layer application layer: sha256:08a21093e79423c17b58325decc48d7196481ed55276c2d168de23a75d38727e (80394382 bytes) image configuration: sha256:2648cd8cca1cad7ec5b386e8433e36ca77a40e31859e5994260b2ef1d07f0753 (462 bytes) ... ``` The `application layer` hashes are different, even though they contain the same binary. The `image configuration` metadata blob hashes also differ, but they contain timestamps so this will continue to happen even after this PR is merged. A future change could make these timestamps default to the epoch, allowing identical metadata blobs to be created on Linux and macOS as well. The image layer is a gzipped TAR archive containing the executable. Saving the intermediate steps shows that the TAR archives are identical and the gzipped streams are different, but only by one byte: ``` % diff <(hexdump -X linux-image.tar.gz) <(hexdump -X darwin-image.tar.gz) 1c1 < 0000000 1f 8b 08 00 00 00 00 00 00 03 ed 57 eb 6e 1c b7 --- > 0000000 1f 8b 08 00 00 00 00 00 00 13 ed 57 eb 6e 1c b7``` ``` The difference is in the 10th byte of the gzip header: the [OS field](https://datatracker.ietf.org/doc/html/rfc1952#page-5). RFC 1952 defines a list of [known operating systems](https://datatracker.ietf.org/doc/html/rfc1952#page-8): `0x03` is the OS code for Unix, however the RFC was written in 1996 so `Macintosh` refers to the classic MacOS. Zlib uses an updated operating system list madler/zlib@ce12c5c which defines `19` / `0x13` as the OS code for Darwin. Interestingly, using `gzip` to compress a file directly produces identical results on macOS and Linux (`-n` is needed to prevent `gzip` from including the current timestamp on macOS): ``` linux% cat hello-world | gzip -n | md5sum ef64adbee9e89e78114000442a804e0e - macos% cat hello-world | gzip -n | md5sum ef64adbee9e89e78114000442a804e0e - ``` Modifications ------------- By default, Zlib uses the value of `OS_CODE` [set at compile time](https://github.com/madler/zlib/blob/ef24c4c7502169f016dcd2a26923dbaf3216748c/deflate.c#L1054). This commit uses [deflateSetHeader()](https://github.com/madler/zlib/blob/ef24c4c7502169f016dcd2a26923dbaf3216748c/deflate.c#L705) to override the default gzip header, forcing the OS code to be 0x03 (Unix) on both Linux and macOS. Result ------ After this change, image layers containing the same binary will use identical gzip headers and should have the same hash whether they are built on Linux or macOS. It is still possible that different versions of Zlib might produce different compressed data, causing the overall hashes to change. Test Plan --------- Tested manually on macOS and Linux, verifying that image layers containing identical binaries have identical hashes. Added a test for `containertool`'s `gzip` function.
1 parent 7c8ada9 commit 403e0fe

File tree

3 files changed

+58
-1
lines changed

3 files changed

+58
-1
lines changed

Package.swift

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ let package = Package(
8686
name: "ContainerRegistryTests",
8787
dependencies: [.target(name: "ContainerRegistry")],
8888
resources: [.process("Resources")]
89-
),
89+
), .testTarget(name: "containertoolTests", dependencies: [.target(name: "containertool")]),
9090
],
9191
swiftLanguageModes: [.v6]
9292
)

Sources/containertool/gzip.swift

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,36 @@ func gzip(_ bytes: [UInt8]) -> [UInt8] {
3535
stream.zfree = nil
3636
stream.opaque = nil
3737

38+
// Force identical gzip headers to be created on Linux and macOS.
39+
//
40+
// RFC1952 defines operating system codes which can be embedded in the gzip header.
41+
//
42+
// * Initially, zlib generated a default gzip header with the
43+
// OS field set to `Unknown` (255).
44+
// * https://github.com/madler/zlib/commit/0484693e1723bbab791c56f95597bd7dbe867d03
45+
// changed the default to `Unix` (3).
46+
// * https://github.com/madler/zlib/commit/ce12c5cd00628bf8f680c98123a369974d32df15
47+
// changed the default to use a value based on the OS detected
48+
// at compile time. After this, zlib on Linux continued to
49+
// use `Unix` (3) whereas macOS started to use `Apple` (19).
50+
//
51+
// According to RFC1952 Section 2.3.1.2. (Compliance), `Unknown`
52+
// 255 should be used by default where the OS on which the file
53+
// was created is not known.
54+
//
55+
// Different versions of zlib might still produce different
56+
// compressed output for the same input, but using the same default
57+
// value removes one one source of differences between platforms.
58+
59+
let gz_os_unknown = Int32(255)
60+
var header = gz_header()
61+
header.os = gz_os_unknown
62+
3863
let windowBits: Int32 = 15 + 16
3964
let level = Z_DEFAULT_COMPRESSION
4065
let memLevel: Int32 = 8
4166
let rc = CNIOExtrasZlib_deflateInit2(&stream, level, Z_DEFLATED, windowBits, memLevel, Z_DEFAULT_STRATEGY)
67+
deflateSetHeader(&stream, &header)
4268

4369
precondition(rc == Z_OK, "Unexpected return from zlib init: \(rc)")
4470

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
//===----------------------------------------------------------------------===//
2+
//
3+
// This source file is part of the SwiftContainerPlugin open source project
4+
//
5+
// Copyright (c) 2024 Apple Inc. and the SwiftContainerPlugin project authors
6+
// Licensed under Apache License v2.0
7+
//
8+
// See LICENSE.txt for license information
9+
// See CONTRIBUTORS.txt for the list of SwiftContainerPlugin project authors
10+
//
11+
// SPDX-License-Identifier: Apache-2.0
12+
//
13+
//===----------------------------------------------------------------------===//
14+
15+
import Foundation
16+
@testable import containertool
17+
import Crypto
18+
import XCTest
19+
20+
class ZlibTests: XCTestCase, @unchecked Sendable {
21+
// Check that compressing the same data on macOS and Linux produces the same output.
22+
func testGzipHeader() async throws {
23+
let data = "test"
24+
25+
let result = gzip([UInt8](data.utf8))
26+
XCTAssertEqual(
27+
"\(SHA256.hash(data: result))",
28+
"SHA256 digest: 7dff8d09129482017247cb373e8138772e852a1a02f097d1440387055d2be69c"
29+
)
30+
}
31+
}

0 commit comments

Comments
 (0)