You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
containertool: Use same gzip headers on Linux and macOS (#37)
Motivation
----------
Packaging the same binary using the same version of `containertool`
produces different application image layers on macOS and Linux:
```
linux% swift run containertool --verbose --repository registry.test:5000/hello hello-world --from scratch
...
Uploading application layer
application layer: sha256:54a282d5cd082320d2d4976e7d9a952da46e3bc4bab3ce1e0b3931ccf945b849 (80394382 bytes)
image configuration: sha256:fdcb887ef6e27a09456419b03b1d8353b15d68d088b8ea023f38af892fca69be (462 bytes)
...
macos% swift run containertool --verbose --repository registry.test:5000/hello hello-world --from scratch
...
Uploading application layer
application layer: sha256:08a21093e79423c17b58325decc48d7196481ed55276c2d168de23a75d38727e (80394382 bytes)
image configuration: sha256:2648cd8cca1cad7ec5b386e8433e36ca77a40e31859e5994260b2ef1d07f0753 (462 bytes)
...
```
The `application layer` hashes are different, even though they
contain the same binary. The `image configuration` metadata blob
hashes also differ, but they contain timestamps so this will continue
to happen even after this PR is merged. A future change could
make these timestamps default to the epoch, allowing identical
metadata blobs to be created on Linux and macOS as well.
The image layer is a gzipped TAR archive containing the executable.
Saving the intermediate steps shows that the TAR archives are identical
and the gzipped streams are different, but only by one byte:
```
% diff <(hexdump -X linux-image.tar.gz) <(hexdump -X darwin-image.tar.gz)
1c1
< 0000000 1f 8b 08 00 00 00 00 00 00 03 ed 57 eb 6e 1c b7
---
> 0000000 1f 8b 08 00 00 00 00 00 00 13 ed 57 eb 6e 1c b7```
```
The difference is in the 10th byte of the gzip header: the [OS
field](https://datatracker.ietf.org/doc/html/rfc1952#page-5). RFC
1952 defines a list of [known operating
systems](https://datatracker.ietf.org/doc/html/rfc1952#page-8):
`0x03` is the OS code for Unix, however the RFC was written in 1996
so `Macintosh` refers to the classic MacOS. Zlib uses an updated
operating system list
madler/zlib@ce12c5c
which defines `19` / `0x13` as the OS code for Darwin.
Interestingly, using `gzip` to compress a file directly produces
identical results on macOS and Linux (`-n` is needed to prevent `gzip`
from including the current timestamp on macOS):
```
linux% cat hello-world | gzip -n | md5sum
ef64adbee9e89e78114000442a804e0e -
macos% cat hello-world | gzip -n | md5sum
ef64adbee9e89e78114000442a804e0e -
```
Modifications
-------------
By default, Zlib uses the value of `OS_CODE` [set at compile
time](https://github.com/madler/zlib/blob/ef24c4c7502169f016dcd2a26923dbaf3216748c/deflate.c#L1054).
This commit uses
[deflateSetHeader()](https://github.com/madler/zlib/blob/ef24c4c7502169f016dcd2a26923dbaf3216748c/deflate.c#L705)
to override the default gzip header, forcing the OS code to be 0x03
(Unix) on both Linux and macOS.
Result
------
After this change, image layers containing the same binary will use
identical gzip headers and should have the same hash whether they
are built on Linux or macOS. It is still possible that different
versions of Zlib might produce different compressed data, causing
the overall hashes to change.
Test Plan
---------
Tested manually on macOS and Linux, verifying that image layers
containing identical binaries have identical hashes.
Added a test for `containertool`'s `gzip` function.
0 commit comments