-
Notifications
You must be signed in to change notification settings - Fork 22
Respect SOURCE_DATE_EPOCH
for better reproducibility of builds
#109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Respect SOURCE_DATE_EPOCH
for better reproducibility of builds
#109
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @agriyakhetarpal! I think it is a good improvement. Also adding cross-ref for the previous discussion: pyodide/pyodide#4286
Could we have a test for the new behavior?
Thanks for the review, @ryanking13 – I have added a bunch of tests, and also manually confirmed with the integration tests that the change works towards their reproducibility. This should resolve most of the conversation in pyodide/pyodide#4286. I'll leave a comment there about it, which should help, as issues/PRs cross-linked from discussions don't show up in the UI. I had a couple of questions about the downstream integration of this feature based on that discussion:
|
I'm not sure but both of these sound like good ideas to me. I know that there is a bit of no determinism in the binaries generated by emcc, not sure about shared libraries. Maybe we should open an issue on emscripten asking what the status of reproducible builds is. |
What feels to me like it would make the most sense would be to set the date based on when the latest commit occurred. |
I found emscripten-core/emscripten#7714, though from the conversation I feel that it was closed as "won't fix" for an unrelated reason. I don't want to comment there asking for an update as it is quite old, but if the premise is that if LLVM guarantees reproducibility as per https://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html, we should be fine with even level 1 determinism even if we can't have level 4 determinism right now. One thing that could be nice is to switch the Pyodide runtime's versioning scheme to be based on Git tags. That could require a bit of effort because of all the moving parts everywhere, though (especially downstream). |
I'll mark it as a draft for now, as I notice that the |
We filter environment variables. Is the problem that you haven't updated that logic to let |
Ah, I didn't know that. Could you please share a pointer where we do so in the |
I've tried to pass it to the |
You need to add it here: Any environment variable not in |
Co-Authored-By: Hood Chatham <[email protected]>
Done, but I'm not too sure if it works – as the checksum for However, running |
Make the recipe echo the environment variable in the build script to check if it made it through. |
Yes, it makes it through:
That makes me think, maybe it is |
Answering my own question: yes, I think this is what is happening, as
returns the same output on repeated runs. |
That means we should be good to go for this PR, but we'll need to update |
🤔 Building NumPy repeatedly outside of the lockfile generation displays different checksums every time for its wheels1. Still, upon opening the wheel and the test archives, I see that However, both the So, in general, I wonder if we're checking for the right things here by trying to validate checksums. Footnotes
|
The difference in the checksums comes from the fact that we don't build without isolation. diff -r integration_tests/recipes/numpy/dist/numpy-2.2.3-cp312-cp312-pyodide_2024_0_wasm32/numpy/__config__.py integration_tests/recipes/numpy/dist1/numpy-2.2.3-cp312-cp312-pyodide_2024_0_wasm32/numpy/__config__.py
96c96
< "path": r"/private/var/folders/b3/2bq1m1_50bs4c7305j8vxcqr0000gn/T/build-env-eecbpwc2/bin/python",
---
> "path": r"/private/var/folders/b3/2bq1m1_50bs4c7305j8vxcqr0000gn/T/build-env-mtjzvpw6/bin/python",
diff -r integration_tests/recipes/numpy/dist/numpy-2.2.3-cp312-cp312-pyodide_2024_0_wasm32/numpy-2.2.3.dist-info/RECORD integration_tests/recipes/numpy/dist1/numpy-2.2.3-cp312-cp312-pyodide_2024_0_wasm32/numpy-2.2.3.dist-info/RECORD
1c1
< numpy/__config__.py,sha256=pcS5KlVZFsJDU9lAXUg_JGvcAceIBPbP_1bcQBdjZlM,5015
---
> numpy/__config__.py,sha256=J9aPth6duJdP4BjknvJiCVOg1JIQa4FInIP4SqrFoSw,5015 Subsequently, as the wheel differs, |
I think I'm running into the same problem I faced in pyodide/pyodide#5031. Is there a way to add |
TODO:
|
I believe this is not true. If it does, it should be a bug. We passthrough all host env variables to the builder. |
Thanks for the investigation. Supporting Anyway, we've got the request to support no isolation mode a few times, so it's probably worth discussing in a separate issue. |
Interesting. Probably |
Description
This PR adds support for the
SOURCE_DATE_EPOCH
environment variable when creating wheels from recipes to enable reproducible builds. The implementation follows the Python example from the https://reproducible-builds.org specification.I can't say that this change makes builds completely reproducible, as we are responsible only for these utility functions and our usage of
pypa/build
as a build frontend – which doesn't guarantee reproducibility at all (see pypa/build#385); that responsibility relies on the build backend, which we are agnostic to. It's a start towards more reproducibility, however.I've modified the
_make_whlfile
function to set the timestamps based onSOURCE_DATE_EPOCH
. WhenSOURCE_DATE_EPOCH
is provided, all files inside the wheel will have their timestamps set to the provided date. "January 1st 1980 00:00:00 UTC" (Unix timestamp315532800
) is used as it is the minimum for ZIP file compatibility. Areproducible_filter
has been added for tarfile creation that acts similarly to the "data" filter but with addedSOURCE_DATE_EPOCH
consideration.The goal is to make static/shared library packages, wheels, and the tests unvendored from wheels if set as reproducible as possible.