-
Notifications
You must be signed in to change notification settings - Fork 1.4k
A discussion on helping downstream packaging #1791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 11 commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
96058cc
Start a discussion on helping downstream packaging
mgorny b4cb7d2
Add a section on Internet access
mgorny df7081c
Also suggest the option of splitting test data into separate archive
mgorny cff76df
Add a section on system dependencies
mgorny be242a8
Add a section on downstream testing
mgorny 9f9af53
Expand on downstream testing, and add emphasis for readability
mgorny 00f39e5
Reorganize into why/how sections, and add emphasis
mgorny 08c70e8
Elaborate a bit more on why it's good to help downstreams
mgorny 24f552e
Correct "pytest" capitalization
mgorny 24b0346
Apply suggestions from code review
mgorny 16288af
Apply more suggestions from code review
mgorny b4c7485
Attempt addressing the remaining review comments
mgorny df0c91a
Add a section on stable channels
mgorny fc72a38
Retitle as "Supporting downstream packaging"
mgorny 24462f4
Add a "not all-or-nothing" sentence
mgorny 29cc38a
Add a note that downstreams can send patches to fix these issues
mgorny 9d5fbe6
Capitalize Git, per @pawamoy
mgorny 4d95da2
Fix inconsistent case in bullet points and remove duplicate
mgorny 6f55709
Apply typo fixes, thanks to @pawamoy
mgorny 548ab34
Clarify that source distribution needs only package's files
mgorny e925da1
Fix inconsistent whitespace between sentences
mgorny 0eb407c
Make the point of reusing source distribution lighter
mgorny 94743f9
Clarify the Internet part
mgorny 704d1a5
Apply suggestions from code review
mgorny e596609
Remove duplicate paragraph
mgorny 169281d
Clarify source distributions
mgorny bb8ac35
Add non-reproducibility argument for changing resources
mgorny addf891
Mention removing duplication of patches and inconsistency
mgorny 8a3a56c
Reword installing tests to make it clearer
mgorny 58eaf85
Give an example of "catastrophic failure"
mgorny 76aaf79
Indicate that some distributions require building from sources
mgorny 4f97860
Merge branch 'main' into discussion-downstream
ncoghlan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,378 @@ | ||
| .. _downstream-packaging: | ||
|
|
||
| ================================= | ||
| Simplifying downstream packaging | ||
| ================================= | ||
|
|
||
| :Page Status: Draft | ||
| :Last Reviewed: 2025-? | ||
|
|
||
| While PyPI and the Python packaging tools such as :ref:`pip` are the primary | ||
| means of distributing Python packages, they are also often made available as part | ||
| of other packaging ecosystems. These repackaging efforts are collectively called | ||
| *downstream* packaging (your own efforts are called *upstream* packaging), | ||
| and include such projects as Linux distributions, Conda, Homebrew and MacPorts. | ||
| They generally aim to provide improved support for use cases that cannot be handled | ||
| via Python packaging tools alone, such as native integration with a specific operating | ||
| system, or assured compatibility with specific versions of non-Python software. | ||
|
|
||
| This discussion attempts to explain how downstream packaging is usually done, | ||
| and what additional challenges downstream packagers typically face. It aims | ||
| to provide some optional guidelines that project maintainers may choose to | ||
| follow which help make downstream packaging *significantly* easier | ||
| (without imposing any major maintenance hassles on the upstream project). | ||
|
|
||
| Establishing a good relationship between software maintainers and downstream | ||
| packagers can bring mutual benefits. Downstreams are often willing to share | ||
| their experience, time and hardware to improve your package. They are | ||
| sometimes in a better position to see how your package is used in practice, | ||
| and to provide information about its relationships with other packages that | ||
| would otherwise require significant effort to obtain. | ||
| Packagers can often find bugs before your users hit them in production, | ||
| provide bug reports of good quality, and supply patches whenever they can. | ||
| For example, they are regularly active in ensuring the packages they redistribute | ||
| are updated for any compatibility issues that arise when a new Python version | ||
| is released. | ||
|
|
||
| Please note that downstream builds include not only binary redistribution, | ||
| but also source builds done on user systems (in source-first distributions | ||
| such as Gentoo Linux, for example). | ||
|
|
||
|
|
||
| .. _provide-complete-source-distributions: | ||
|
|
||
| Provide complete source distributions | ||
| ------------------------------------- | ||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Why? | ||
| ~~~~ | ||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| The vast majority of downstream packagers prefer to build packages from source, | ||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| rather than use the upstream-provided binary packages. This is also true | ||
| of pure Python packages that provide universal wheels. The reasons for using | ||
| source distributions may include: | ||
|
|
||
| - being able to audit the source code of all packages | ||
|
|
||
| - being able to run the test suite and build documentation | ||
|
|
||
| - being able to easily apply patches, including backporting commits | ||
| from the project's repository and sending patches back to the project | ||
|
|
||
| - being able to build on a specific platform that is not covered | ||
| by upstream builds | ||
|
|
||
| - being able to build against specific versions of system libraries | ||
|
|
||
| - having a consistent build process across all Python packages | ||
|
|
||
| While it is usually possible to build packages from a git repository, there are | ||
| a few important reasons to provide a static archive file instead: | ||
|
|
||
| - Fetching a single file is often more efficient, more reliable and better | ||
| supported than e.g. using a git clone. This can help users with poor | ||
| Internet connectivity. | ||
|
|
||
| - Downstreams often **use checksums to verify the authenticity** of source files | ||
| on subsequent builds, which require that they remain bitwise identical over | ||
| time. For example, automatically generated git archives do not guarantee | ||
| that. | ||
ncoghlan marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| - Archive files can be mirrored, reducing both upstream and downstream | ||
| bandwidth use. The actual builds can afterwards be performed in firewalled | ||
| or offline environments, that can only access source files provided | ||
| by the local mirror or redistributed earlier. | ||
|
|
||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - Explicitly publishing archive files can ensure that any dependencies on version control | ||
| system metadata are resolved when creating the source archive. For example, automatically | ||
| generated git archives omit all of the commit tag information, potentially resulting in | ||
| incorrect version details in the resulting builds. | ||
|
|
||
| How? | ||
| ~~~~ | ||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Ideally, **a source distribution archive should include all the files** | ||
| necessary to build the package itself, run its test suite, build and install | ||
| its documentation, and any other files that may be useful to end users, such as | ||
| shell completions, editor support files, and so on. | ||
|
|
||
| Some projects are concerned about increasing the size of source distribution, | ||
mgorny marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| or do not wish Python packaging tools to fall back to source distributions | ||
mgorny marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| automatically. In these cases, a good compromise may be to publish a separate | ||
| source archive for downstream use, for example by attaching it to a GitHub | ||
| release. Alternatively, large files, such as test data, can be split into | ||
| separate archives. | ||
|
|
||
| A good idea is to **use your source distribution in the release workflow**. | ||
| That is, build it first, then unpack it and perform all the remaining steps | ||
| using the unpacked distribution rather than the git repostiry — run tests, | ||
mgorny marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| build documentation, build wheels. This ensures that it is well-tested, | ||
| and reduces the risk that some users would hit build failures or install | ||
| an incomplete package. | ||
|
|
||
|
|
||
| .. _no-internet-access-in-builds: | ||
|
|
||
| Do not use the Internet during the build process | ||
| ------------------------------------------------ | ||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Why? | ||
| ~~~~ | ||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Downstream builds are frequently done in sandboxed environments that cannot | ||
| access the Internet. Even if this is not the case, and assuming that you took | ||
| sufficient care to properly authenticate downloads, using the Internet | ||
| is discouraged for a number of reasons: | ||
|
|
||
| - The Internet connection may be unstable (e.g. due to poor reception) | ||
| or suffer from temporary problems that could cause the process to fail | ||
| or hang. | ||
|
|
||
| - The remote resources may become temporarily or even permanently | ||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| unavailable, making the build no longer possible. This is especially | ||
| problematic when someone needs to build an old package version. | ||
|
|
||
| - Accessing remote servers poses a **privacy** issue and a potential | ||
| **security** issue, as it exposes information about the system building | ||
| the package. | ||
ncoghlan marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| - The user may be using a service with a limited data plan, in which | ||
| uncontrolled Internet access may result in additional charges or other | ||
| inconveniences. | ||
|
|
||
| How? | ||
| ~~~~ | ||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Your source distribution should either include all the files needed | ||
| for the package to build, or allow provisioning them externally. Ideally, | ||
| it should not even attempt to access the Internet at all, unless explicitly | ||
| requested to. If that is not possible to achieve, the next best thing | ||
| is to provide an opt-out switch to disable all Internet access. | ||
|
|
||
| When such a switch is used, the build process should fail if some | ||
| of the required files are missing, rather than try to fetch them automatically. | ||
| This could be done e.g. by checking whether a ``NO_NETWORK`` environment | ||
| variable is set to a non-empty value. Please also remember that if you are | ||
| fetching remote resources, you must **verify their authenticity**, e.g. against | ||
| a checksum, to protect against the file being substituted by a malicious party. | ||
ncoghlan marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Since downstreams frequently also run tests and build documentation, the above | ||
| should ideally extend to these processes as well. | ||
|
|
||
|
|
||
| .. _support-system-dependencies-in-builds: | ||
|
|
||
| Support building against system dependencies | ||
| -------------------------------------------- | ||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| Why? | ||
ncoghlan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ~~~~ | ||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| Some Python projects have non-Python dependencies, such as libraries written | ||
| in C or C++. Trying to use the system versions of these dependencies | ||
ncoghlan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| in upstream packaging may cause a number of problems for end users: | ||
|
|
||
| - The published wheels require a binary-compatible version of the used | ||
| library to be present on the user's system. If the library is missing | ||
| or an incompatible version is installed, the Python package may fail with errors | ||
| that are not clear to inexperienced users, or even misbehave at runtime. | ||
|
|
||
| - Building from a source distribution requires a source-compatible version | ||
| of the dependency to be present, along with its development headers | ||
| and other auxiliary files that some systems package separately | ||
| from the library itself. | ||
|
|
||
| - Even for an experienced user, installing a compatible dependency version | ||
| may be very hard. For example, the used Linux distribution may not provide | ||
| the required version, or some other package may require an incompatible | ||
| version. | ||
|
|
||
| - The linkage between the Python package and its system dependency is not | ||
| recorded by the packaging system. The next system update may upgrade | ||
| the library to a newer version that breaks binary compatibility with | ||
| the Python package, and requires user intervention to fix. | ||
|
|
||
| For these reasons, you may reasonably decide to either statically link | ||
| your dependencies, or to provide local copies in the installed package. | ||
| You may also vendor the dependency in your source distribution. Sometimes | ||
| these dependencies are also repackaged on PyPI, and can be declared as | ||
| project dependencies like any other Python package. | ||
|
|
||
| However, none of these issues apply to downstream packaging, and downstreams | ||
| have good reasons to prefer dynamically linking to system dependencies. | ||
| In particular: | ||
|
|
||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - in many cases, reliably sharing dynamic dependencies between components is a large part | ||
| of the *purpose* of a downstream packaging ecosystem. Helping to support that makes it | ||
| easier for users of those systems to access upstream projects in their preferred format. | ||
|
|
||
| - in many cases, reliably sharing dynamic dependencies between components is a large part | ||
| of the *purpose* of a downstream packaging ecosystem. Helping to support that makes it | ||
| easier for users of those systems to access upstream projects in their preferred format. | ||
mgorny marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| - Static linking and vendoring obscures the use of external dependencies, | ||
| making source auditing harder. | ||
|
|
||
| - Dynamic linking makes it possible to quickly and systematically replace the used | ||
| libraries across an entire downstream packaging ecosystem, which can be particularly | ||
| important when they turn out to contain a security vulnerability or critical bug. | ||
|
|
||
| - Using system dependencies makes the package benefit from downstream | ||
| customization that can improve the user experience on a particular platform, | ||
| without the downstream maintainers having to consistently patch | ||
| the dependencies vendored in different packages. This can include | ||
| compatibility improvements and security hardening. | ||
|
|
||
| - Static linking and vendoring can result in multiple different versions of the | ||
| same library being loaded in the same process (for example, attempting to | ||
| import two Python packages that link to different versions of the same library). | ||
| This sometimes works without incident, but it can also lead to anything from library | ||
| loading errors, to subtle runtime bugs, to catastrophic system failures. | ||
|
|
||
| - Last but not least, static linking and vendoring results in duplication, | ||
| and may increase the use of both disk space and memory. | ||
|
|
||
| How? | ||
| ~~~~ | ||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| A good compromise between the needs of both parties is to provide a switch | ||
| between using vendored and system dependencies. Ideally, if the package has | ||
| multiple vendored dependencies, it should provide both individual switches | ||
| for each dependency, and a general switch to control the default for them, | ||
| e.g. via a ``USE_SYSTEM_DEPS`` environment variable. | ||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| If the user requests using system dependencies, and a particular dependency | ||
| is either missing or incompatible, the build should fail with an explanatory | ||
| message rather than fall back to a vendored version. This gives the packager | ||
| the opportunity to notice their mistake and a chance to consciously decide | ||
| how to solve it. | ||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Note that it is reasonable for upstream projects to leave *testing* of building with | ||
mgorny marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| system dependencies to their downstream repackagers. The goal of these guidelines | ||
| is to facilitate more effective collaboration between upstream projects and downstream | ||
| repackagers, not to suggest upstream projects take on tasks that downstream repackagers | ||
| are better equipped to handle. | ||
| Note that it is reasonable for upstream projects to leave *testing* of building with | ||
mgorny marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| system dependencies to their downstream repackagers. The goal of these guidelines | ||
| is to facilitate more effective collaboration between upstream projects and downstream | ||
| repackagers, not to suggest upstream projects take on tasks that downstream repackagers | ||
| are better equipped to handle. | ||
|
|
||
| .. _support-downstream-testing: | ||
|
|
||
| Support downstream testing | ||
| -------------------------- | ||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| Why? | ||
| ~~~~ | ||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| A variety of downstream projects run some degree of testing on the packaged | ||
| Python projects. Depending on the particular case, this can range from minimal | ||
| smoke testing to comprehensive runs of the complete test suite. There can | ||
| be various reasons for doing this, for example: | ||
|
|
||
| - Verifying that the downstream packaging did not introduce any bugs. | ||
|
|
||
| - Testing on additional platforms that are not covered by upstream testing. | ||
|
|
||
| - Finding subtle bugs that can only be reproduced with particular hardware, | ||
| system package versions, and so on. | ||
|
|
||
| - Testing the released package against newer (or older) dependency versions than | ||
| the ones present during upstream release testing. | ||
|
|
||
| - Testing the package in an environment closely resembling the production | ||
| setup. This can detect issues caused by nontrivial interactions between | ||
| different installed packages, including packages that are not dependencies | ||
| of your package, but nevertheless can cause issues. | ||
|
|
||
| - Testing the released package against newer Python versions (including | ||
| newer point releases), or less tested Python implementations such as PyPy. | ||
|
|
||
| Admittedly, sometimes downstream testing may yield false positives or bug | ||
| reports about scenarios the upstream project is not interested in supporting. | ||
| However, perhaps even more often it does provide early notice of problems, | ||
| or find nontrivial bugs that would otherwise cause issues for your users | ||
| in production. And believe me, the majority of **downstream packagers are doing | ||
| their best to double-check their results, and help you triage and fix the bugs | ||
| that they report**. | ||
ncoghlan marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| How? | ||
| ~~~~ | ||
mgorny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| There are a number of things that upstream projects can do to help downstream | ||
| repackagers test their packages efficiently and effectively, including some of the suggestions | ||
| already mentioned above. These are typically improvements that make the test suite more | ||
| reliable and easier to use for everyone, not just downstream packagers. | ||
| Some specific suggestions are: | ||
|
|
||
| - Include the test files and fixtures in the source distribution, or make it | ||
| possible to easily download them separately. | ||
|
|
||
| - Do not write to the package directories during testing. Downstream test | ||
| setups sometimes run tests on top of the installed package, and modifications | ||
| performed during testing and temporary test files may end up being part | ||
| of the installed package! | ||
|
|
||
| - Make the test suite work offline. Mock network interactions, using | ||
| packages such as responses_ or vcrpy_. If that is not possible, make it | ||
| possible to easily disable the tests using Internet access, e.g. via a pytest_ | ||
| marker. Use pytest-socket_ to verify that your tests work offline. This | ||
| often makes your own test workflows faster and more reliable as well. | ||
|
|
||
| - Make your tests work without a specialized setup, or perform the necessary | ||
| setup as part of test fixtures. Do not ever assume that you can connect | ||
| to system services such as databases — in an extreme case, you could crash | ||
| a production service! | ||
|
|
||
| - If your package has optional dependencies, make their tests optional as | ||
| well. Either skip them if the needed packages are not installed, or add | ||
| markers to make deselecting easy. | ||
|
|
||
| - More generally, add markers to tests with special requirements. These can | ||
| include e.g. significant space usage, significant memory usage, long runtime, | ||
| incompatibility with parallel testing. | ||
|
|
||
| - Do not assume that the test suite will be run with ``-Werror``. Downstreams | ||
| often need to disable that, as it causes false positives, e.g. due to newer | ||
| dependency versions. Assert for warnings using ``pytest.warns()`` rather | ||
| than ``pytest.raises()``! | ||
|
|
||
| - Aim to make your test suite reliable and reproducible. Avoid flaky tests. | ||
| Avoid depending on specific platform details, don't rely on exact results | ||
| of floating-point computation, or timing of operations, and so on. Fuzzing | ||
| has its advantages, but you want to have static test cases for completeness | ||
| as well. | ||
|
|
||
| - Split tests by their purpose, and make it easy to skip categories that are | ||
| irrelevant or problematic. Since the primary purpose of downstream testing | ||
| is to ensure that the package itself works, downstreams are not generally interested | ||
| in tasks such as checking code coverage, code formatting, typechecking or running | ||
| benchmarks. These tests can fail as dependencies are upgraded or the system | ||
| is under load, without actually affecting the package itself. | ||
|
|
||
| - If your test suite takes significant time to run, support testing | ||
| in parallel. Downstreams often maintain a large number of packages, | ||
| and testing them all takes a lot of time. Using pytest-xdist_ can help them | ||
| avoid bottlenecks. | ||
|
|
||
| - Ideally, support running your test suite via ``pytest``. pytest_ has many | ||
| command-line arguments that are truly helpful to downstreams, such as | ||
| the ability to conveniently deselect tests, rerun flaky tests | ||
| (via pytest-rerunfailures_), add a timeout to prevent tests from hanging | ||
| (via pytest-timeout_) or run tests in parallel (via pytest-xdist_). | ||
ncoghlan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| .. _responses: https://pypi.org/project/responses/ | ||
| .. _vcrpy: https://pypi.org/project/vcrpy/ | ||
| .. _pytest-socket: https://pypi.org/project/pytest-socket/ | ||
| .. _pytest-xdist: https://pypi.org/project/pytest-xdist/ | ||
| .. _pytest: https://pytest.org/ | ||
| .. _pytest-rerunfailures: https://pypi.org/project/pytest-rerunfailures/ | ||
| .. _pytest-timeout: https://pypi.org/project/pytest-timeout/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.