Skip to content

RFC: a specific commit hash for DEFAULT_CROSS_BUILD_ENV_METADATA_URL for better reproducibility #177

@agriyakhetarpal

Description

@agriyakhetarpal

While PRs such as #109 will help us make more reproducible builds through code, there are aspects of the Pyodide conveyor belt and build factory that do not enforce reproducibility properly. For example, #170 is one of them – where we have non-reproducibility in our tests, which we've left as a TODO to resolve.

Similarly, there are other aspects where we break reproducibility: we currently use the main branch for the raw GitHub URLs in DEFAULT_CROSS_BUILD_ENV_METADATA_URL. This has several drawbacks:

  • raw GitHub URLs that don't point to a specific blob or a tag can be susceptible to file drift, eventually leading to link rot
  • in case this file is removed someday, either by manual error or a cybersecurity attack (hopefully not :P), out-of-tree builds in CI for almost ~60 packages that do not cache our cross-build environment will break (I don't think any of them are doing it)
  • downstream clients such as cibuildwheel get unreliable and non-reproducible builds down the line if we change something retroactively.

Coupled with this, the only way to customise the xbuildenv outside of the pyodide xbuildenv search interface is to use pyodide xbuildenv install --url and point to a specific URL, and we don't yet support an environment variable or pyodide config option to configure the xbuildenv. See also #68 and pyodide/pyodide-build-environment-nightly#14 for some briefly related discussions. Even then, we don't handle the case where nightly build environments won't be compatible with pyodide-build, forcing them to use the --force parameter explicitly.

I have a few solutions in mind to address the reproducibility aspect, in different extents of breakage(s) – in no order:

  1. point to a specific commit hash for the metadata URL
    i. a particular release of pyodide-build will use a pinned version of the URL, and will be versioned alongside the pyodide-build releases
    ii. when there's an update needed to the metadata, pyodide-ci-bot will send out patches and point to the most recent commit hash.
    ii. if we break something in a release, we apply a fix, backport it, and release it as soon as possible (despite the cost of extra maintenance, such a scenario is rare).
  2. version the cross-build metadata file outside the Pyodide repository for future versions?
  3. a combination of points 1 and 2, so that we won't need to couple it with Pyodide's versions, as pointing to a commit hash that produces broken builds can be corrected much more easily

I think option 3 sounds like the best one out there, given that it does not require a release in the Pyodide repository.

Some cons of this approach:

  • on how we handle the migration between the current non-reproducible variant to a better-reproducible one
    • idea: we could use the pyodide-build version for this. If it's a previous one rather than a particular version $X$, we use the "legacy" cross-build environment URL (this one). if it's equal to $X$ or a later one, we'll use the new pinned URLs, wherever they are)
  • there is a slightly higher cost of maintainability for all of us that's attached to switching to the new paradigm

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions