-
Notifications
You must be signed in to change notification settings - Fork 22
Description
While PRs such as #109 will help us make more reproducible builds through code, there are aspects of the Pyodide conveyor belt and build factory that do not enforce reproducibility properly. For example, #170 is one of them – where we have non-reproducibility in our tests, which we've left as a TODO to resolve.
Similarly, there are other aspects where we break reproducibility: we currently use the main
branch for the raw GitHub URLs in DEFAULT_CROSS_BUILD_ENV_METADATA_URL
. This has several drawbacks:
- raw GitHub URLs that don't point to a specific blob or a tag can be susceptible to file drift, eventually leading to link rot
- in case this file is removed someday, either by manual error or a cybersecurity attack (hopefully not :P), out-of-tree builds in CI for almost ~60 packages that do not cache our cross-build environment will break (I don't think any of them are doing it)
- downstream clients such as
cibuildwheel
get unreliable and non-reproducible builds down the line if we change something retroactively.
Coupled with this, the only way to customise the xbuildenv outside of the pyodide xbuildenv search
interface is to use pyodide xbuildenv install --url
and point to a specific URL, and we don't yet support an environment variable or pyodide config
option to configure the xbuildenv. See also #68 and pyodide/pyodide-build-environment-nightly#14 for some briefly related discussions. Even then, we don't handle the case where nightly build environments won't be compatible with pyodide-build
, forcing them to use the --force
parameter explicitly.
I have a few solutions in mind to address the reproducibility aspect, in different extents of breakage(s) – in no order:
- point to a specific commit hash for the metadata URL
i. a particular release ofpyodide-build
will use a pinned version of the URL, and will be versioned alongside the pyodide-build releases
ii. when there's an update needed to the metadata, pyodide-ci-bot will send out patches and point to the most recent commit hash.
ii. if we break something in a release, we apply a fix, backport it, and release it as soon as possible (despite the cost of extra maintenance, such a scenario is rare). - version the cross-build metadata file outside the Pyodide repository for future versions?
- a combination of points 1 and 2, so that we won't need to couple it with Pyodide's versions, as pointing to a commit hash that produces broken builds can be corrected much more easily
I think option 3 sounds like the best one out there, given that it does not require a release in the Pyodide repository.
Some cons of this approach:
- on how we handle the migration between the current non-reproducible variant to a better-reproducible one
- idea: we could use the pyodide-build version for this. If it's a previous one rather than a particular version
$X$ , we use the "legacy" cross-build environment URL (this one). if it's equal to$X$ or a later one, we'll use the new pinned URLs, wherever they are)
- idea: we could use the pyodide-build version for this. If it's a previous one rather than a particular version
- there is a slightly higher cost of maintainability for all of us that's attached to switching to the new paradigm