Skip to content
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
96058cc
Start a discussion on helping downstream packaging
mgorny Jan 27, 2025
b4cb7d2
Add a section on Internet access
mgorny Jan 27, 2025
df7081c
Also suggest the option of splitting test data into separate archive
mgorny Jan 27, 2025
cff76df
Add a section on system dependencies
mgorny Jan 28, 2025
be242a8
Add a section on downstream testing
mgorny Jan 28, 2025
9f9af53
Expand on downstream testing, and add emphasis for readability
mgorny Jan 29, 2025
00f39e5
Reorganize into why/how sections, and add emphasis
mgorny Jan 29, 2025
08c70e8
Elaborate a bit more on why it's good to help downstreams
mgorny Jan 29, 2025
24f552e
Correct "pytest" capitalization
mgorny Jan 29, 2025
24b0346
Apply suggestions from code review
mgorny Jan 30, 2025
16288af
Apply more suggestions from code review
mgorny Jan 30, 2025
b4c7485
Attempt addressing the remaining review comments
mgorny Jan 30, 2025
df0c91a
Add a section on stable channels
mgorny Jan 30, 2025
fc72a38
Retitle as "Supporting downstream packaging"
mgorny Jan 31, 2025
24462f4
Add a "not all-or-nothing" sentence
mgorny Jan 31, 2025
29cc38a
Add a note that downstreams can send patches to fix these issues
mgorny Jan 31, 2025
9d5fbe6
Capitalize Git, per @pawamoy
mgorny Feb 1, 2025
4d95da2
Fix inconsistent case in bullet points and remove duplicate
mgorny Feb 1, 2025
6f55709
Apply typo fixes, thanks to @pawamoy
mgorny Feb 1, 2025
548ab34
Clarify that source distribution needs only package's files
mgorny Feb 1, 2025
e925da1
Fix inconsistent whitespace between sentences
mgorny Feb 1, 2025
0eb407c
Make the point of reusing source distribution lighter
mgorny Feb 1, 2025
94743f9
Clarify the Internet part
mgorny Feb 1, 2025
704d1a5
Apply suggestions from code review
mgorny Feb 1, 2025
e596609
Remove duplicate paragraph
mgorny Feb 1, 2025
169281d
Clarify source distributions
mgorny Feb 1, 2025
bb8ac35
Add non-reproducibility argument for changing resources
mgorny Feb 1, 2025
addf891
Mention removing duplication of patches and inconsistency
mgorny Feb 2, 2025
8a3a56c
Reword installing tests to make it clearer
mgorny Feb 2, 2025
58eaf85
Give an example of "catastrophic failure"
mgorny Feb 2, 2025
76aaf79
Indicate that some distributions require building from sources
mgorny Feb 22, 2025
4f97860
Merge branch 'main' into discussion-downstream
ncoghlan Feb 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
378 changes: 378 additions & 0 deletions source/discussions/downstream-packaging.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,378 @@
.. _downstream-packaging:

=================================
Simplifying downstream packaging
=================================

:Page Status: Draft
:Last Reviewed: 2025-?

While PyPI and the Python packaging tools such as :ref:`pip` are the primary
means of distributing Python packages, they are also often made available as part
of other packaging ecosystems. These repackaging efforts are collectively called
*downstream* packaging (your own efforts are called *upstream* packaging),
and include such projects as Linux distributions, Conda, Homebrew and MacPorts.
They generally aim to provide improved support for use cases that cannot be handled
via Python packaging tools alone, such as native integration with a specific operating
system, or assured compatibility with specific versions of non-Python software.

This discussion attempts to explain how downstream packaging is usually done,
and what additional challenges downstream packagers typically face. It aims
to provide some optional guidelines that project maintainers may choose to
follow which help make downstream packaging *significantly* easier
(without imposing any major maintenance hassles on the upstream project).

Establishing a good relationship between software maintainers and downstream
packagers can bring mutual benefits. Downstreams are often willing to share
their experience, time and hardware to improve your package. They are
sometimes in a better position to see how your package is used in practice,
and to provide information about its relationships with other packages that
would otherwise require significant effort to obtain.
Packagers can often find bugs before your users hit them in production,
provide bug reports of good quality, and supply patches whenever they can.
For example, they are regularly active in ensuring the packages they redistribute
are updated for any compatibility issues that arise when a new Python version
is released.

Please note that downstream builds include not only binary redistribution,
but also source builds done on user systems (in source-first distributions
such as Gentoo Linux, for example).


.. _provide-complete-source-distributions:

Provide complete source distributions
-------------------------------------

Why?
~~~~

The vast majority of downstream packagers prefer to build packages from source,
rather than use the upstream-provided binary packages. This is also true
of pure Python packages that provide universal wheels. The reasons for using
source distributions may include:

- being able to audit the source code of all packages

- being able to run the test suite and build documentation

- being able to easily apply patches, including backporting commits
from the project's repository and sending patches back to the project

- being able to build on a specific platform that is not covered
by upstream builds

- being able to build against specific versions of system libraries

- having a consistent build process across all Python packages

While it is usually possible to build packages from a git repository, there are
a few important reasons to provide a static archive file instead:

- Fetching a single file is often more efficient, more reliable and better
supported than e.g. using a git clone. This can help users with poor
Internet connectivity.

- Downstreams often **use checksums to verify the authenticity** of source files
on subsequent builds, which require that they remain bitwise identical over
time. For example, automatically generated git archives do not guarantee
that.

- Archive files can be mirrored, reducing both upstream and downstream
bandwidth use. The actual builds can afterwards be performed in firewalled
or offline environments, that can only access source files provided
by the local mirror or redistributed earlier.

- Explicitly publishing archive files can ensure that any dependencies on version control
system metadata are resolved when creating the source archive. For example, automatically
generated git archives omit all of the commit tag information, potentially resulting in
incorrect version details in the resulting builds.

How?
~~~~

Ideally, **a source distribution archive should include all the files**
necessary to build the package itself, run its test suite, build and install
its documentation, and any other files that may be useful to end users, such as
shell completions, editor support files, and so on.

Some projects are concerned about increasing the size of source distribution,
or do not wish Python packaging tools to fall back to source distributions
automatically. In these cases, a good compromise may be to publish a separate
source archive for downstream use, for example by attaching it to a GitHub
release. Alternatively, large files, such as test data, can be split into
separate archives.

A good idea is to **use your source distribution in the release workflow**.
That is, build it first, then unpack it and perform all the remaining steps
using the unpacked distribution rather than the git repostiry — run tests,
build documentation, build wheels. This ensures that it is well-tested,
and reduces the risk that some users would hit build failures or install
an incomplete package.


.. _no-internet-access-in-builds:

Do not use the Internet during the build process
------------------------------------------------

Why?
~~~~

Downstream builds are frequently done in sandboxed environments that cannot
access the Internet. Even if this is not the case, and assuming that you took
sufficient care to properly authenticate downloads, using the Internet
is discouraged for a number of reasons:

- The Internet connection may be unstable (e.g. due to poor reception)
or suffer from temporary problems that could cause the process to fail
or hang.

- The remote resources may become temporarily or even permanently
unavailable, making the build no longer possible. This is especially
problematic when someone needs to build an old package version.

- Accessing remote servers poses a **privacy** issue and a potential
**security** issue, as it exposes information about the system building
the package.

- The user may be using a service with a limited data plan, in which
uncontrolled Internet access may result in additional charges or other
inconveniences.

How?
~~~~

Your source distribution should either include all the files needed
for the package to build, or allow provisioning them externally. Ideally,
it should not even attempt to access the Internet at all, unless explicitly
requested to. If that is not possible to achieve, the next best thing
is to provide an opt-out switch to disable all Internet access.

When such a switch is used, the build process should fail if some
of the required files are missing, rather than try to fetch them automatically.
This could be done e.g. by checking whether a ``NO_NETWORK`` environment
variable is set to a non-empty value. Please also remember that if you are
fetching remote resources, you must **verify their authenticity**, e.g. against
a checksum, to protect against the file being substituted by a malicious party.

Since downstreams frequently also run tests and build documentation, the above
should ideally extend to these processes as well.


.. _support-system-dependencies-in-builds:

Support building against system dependencies
--------------------------------------------


Why?
~~~~


Some Python projects have non-Python dependencies, such as libraries written
in C or C++. Trying to use the system versions of these dependencies
in upstream packaging may cause a number of problems for end users:

- The published wheels require a binary-compatible version of the used
library to be present on the user's system. If the library is missing
or an incompatible version is installed, the Python package may fail with errors
that are not clear to inexperienced users, or even misbehave at runtime.

- Building from a source distribution requires a source-compatible version
of the dependency to be present, along with its development headers
and other auxiliary files that some systems package separately
from the library itself.

- Even for an experienced user, installing a compatible dependency version
may be very hard. For example, the used Linux distribution may not provide
the required version, or some other package may require an incompatible
version.

- The linkage between the Python package and its system dependency is not
recorded by the packaging system. The next system update may upgrade
the library to a newer version that breaks binary compatibility with
the Python package, and requires user intervention to fix.

For these reasons, you may reasonably decide to either statically link
your dependencies, or to provide local copies in the installed package.
You may also vendor the dependency in your source distribution. Sometimes
these dependencies are also repackaged on PyPI, and can be declared as
project dependencies like any other Python package.

However, none of these issues apply to downstream packaging, and downstreams
have good reasons to prefer dynamically linking to system dependencies.
In particular:

- in many cases, reliably sharing dynamic dependencies between components is a large part
of the *purpose* of a downstream packaging ecosystem. Helping to support that makes it
easier for users of those systems to access upstream projects in their preferred format.

- in many cases, reliably sharing dynamic dependencies between components is a large part
of the *purpose* of a downstream packaging ecosystem. Helping to support that makes it
easier for users of those systems to access upstream projects in their preferred format.

- Static linking and vendoring obscures the use of external dependencies,
making source auditing harder.

- Dynamic linking makes it possible to quickly and systematically replace the used
libraries across an entire downstream packaging ecosystem, which can be particularly
important when they turn out to contain a security vulnerability or critical bug.

- Using system dependencies makes the package benefit from downstream
customization that can improve the user experience on a particular platform,
without the downstream maintainers having to consistently patch
the dependencies vendored in different packages. This can include
compatibility improvements and security hardening.

- Static linking and vendoring can result in multiple different versions of the
same library being loaded in the same process (for example, attempting to
import two Python packages that link to different versions of the same library).
This sometimes works without incident, but it can also lead to anything from library
loading errors, to subtle runtime bugs, to catastrophic system failures.

- Last but not least, static linking and vendoring results in duplication,
and may increase the use of both disk space and memory.

How?
~~~~


A good compromise between the needs of both parties is to provide a switch
between using vendored and system dependencies. Ideally, if the package has
multiple vendored dependencies, it should provide both individual switches
for each dependency, and a general switch to control the default for them,
e.g. via a ``USE_SYSTEM_DEPS`` environment variable.

If the user requests using system dependencies, and a particular dependency
is either missing or incompatible, the build should fail with an explanatory
message rather than fall back to a vendored version. This gives the packager
the opportunity to notice their mistake and a chance to consciously decide
how to solve it.

Note that it is reasonable for upstream projects to leave *testing* of building with
system dependencies to their downstream repackagers. The goal of these guidelines
is to facilitate more effective collaboration between upstream projects and downstream
repackagers, not to suggest upstream projects take on tasks that downstream repackagers
are better equipped to handle.
Note that it is reasonable for upstream projects to leave *testing* of building with
system dependencies to their downstream repackagers. The goal of these guidelines
is to facilitate more effective collaboration between upstream projects and downstream
repackagers, not to suggest upstream projects take on tasks that downstream repackagers
are better equipped to handle.

.. _support-downstream-testing:

Support downstream testing
--------------------------


Why?
~~~~


A variety of downstream projects run some degree of testing on the packaged
Python projects. Depending on the particular case, this can range from minimal
smoke testing to comprehensive runs of the complete test suite. There can
be various reasons for doing this, for example:

- Verifying that the downstream packaging did not introduce any bugs.

- Testing on additional platforms that are not covered by upstream testing.

- Finding subtle bugs that can only be reproduced with particular hardware,
system package versions, and so on.

- Testing the released package against newer (or older) dependency versions than
the ones present during upstream release testing.

- Testing the package in an environment closely resembling the production
setup. This can detect issues caused by nontrivial interactions between
different installed packages, including packages that are not dependencies
of your package, but nevertheless can cause issues.

- Testing the released package against newer Python versions (including
newer point releases), or less tested Python implementations such as PyPy.

Admittedly, sometimes downstream testing may yield false positives or bug
reports about scenarios the upstream project is not interested in supporting.
However, perhaps even more often it does provide early notice of problems,
or find nontrivial bugs that would otherwise cause issues for your users
in production. And believe me, the majority of **downstream packagers are doing
their best to double-check their results, and help you triage and fix the bugs
that they report**.

How?
~~~~


There are a number of things that upstream projects can do to help downstream
repackagers test their packages efficiently and effectively, including some of the suggestions
already mentioned above. These are typically improvements that make the test suite more
reliable and easier to use for everyone, not just downstream packagers.
Some specific suggestions are:

- Include the test files and fixtures in the source distribution, or make it
possible to easily download them separately.

- Do not write to the package directories during testing. Downstream test
setups sometimes run tests on top of the installed package, and modifications
performed during testing and temporary test files may end up being part
of the installed package!

- Make the test suite work offline. Mock network interactions, using
packages such as responses_ or vcrpy_. If that is not possible, make it
possible to easily disable the tests using Internet access, e.g. via a pytest_
marker. Use pytest-socket_ to verify that your tests work offline. This
often makes your own test workflows faster and more reliable as well.

- Make your tests work without a specialized setup, or perform the necessary
setup as part of test fixtures. Do not ever assume that you can connect
to system services such as databases — in an extreme case, you could crash
a production service!

- If your package has optional dependencies, make their tests optional as
well. Either skip them if the needed packages are not installed, or add
markers to make deselecting easy.

- More generally, add markers to tests with special requirements. These can
include e.g. significant space usage, significant memory usage, long runtime,
incompatibility with parallel testing.

- Do not assume that the test suite will be run with ``-Werror``. Downstreams
often need to disable that, as it causes false positives, e.g. due to newer
dependency versions. Assert for warnings using ``pytest.warns()`` rather
than ``pytest.raises()``!

- Aim to make your test suite reliable and reproducible. Avoid flaky tests.
Avoid depending on specific platform details, don't rely on exact results
of floating-point computation, or timing of operations, and so on. Fuzzing
has its advantages, but you want to have static test cases for completeness
as well.

- Split tests by their purpose, and make it easy to skip categories that are
irrelevant or problematic. Since the primary purpose of downstream testing
is to ensure that the package itself works, downstreams are not generally interested
in tasks such as checking code coverage, code formatting, typechecking or running
benchmarks. These tests can fail as dependencies are upgraded or the system
is under load, without actually affecting the package itself.

- If your test suite takes significant time to run, support testing
in parallel. Downstreams often maintain a large number of packages,
and testing them all takes a lot of time. Using pytest-xdist_ can help them
avoid bottlenecks.

- Ideally, support running your test suite via ``pytest``. pytest_ has many
command-line arguments that are truly helpful to downstreams, such as
the ability to conveniently deselect tests, rerun flaky tests
(via pytest-rerunfailures_), add a timeout to prevent tests from hanging
(via pytest-timeout_) or run tests in parallel (via pytest-xdist_).


.. _responses: https://pypi.org/project/responses/
.. _vcrpy: https://pypi.org/project/vcrpy/
.. _pytest-socket: https://pypi.org/project/pytest-socket/
.. _pytest-xdist: https://pypi.org/project/pytest-xdist/
.. _pytest: https://pytest.org/
.. _pytest-rerunfailures: https://pypi.org/project/pytest-rerunfailures/
.. _pytest-timeout: https://pypi.org/project/pytest-timeout/
1 change: 1 addition & 0 deletions source/discussions/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@ specific topic. If you're just trying to get stuff done, see
src-layout-vs-flat-layout
setup-py-deprecated
single-source-version
downstream-packaging