From 96058cc8d3b42c00c26fb04cc5bb2c530dc67d4e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Mon, 27 Jan 2025 18:08:49 +0100 Subject: [PATCH 01/31] Start a discussion on helping downstream packaging Start the discussion aimed at making downstream packaging easier. The first section focuses on providing source distributions. Also related to bug #1494, though it's focused on why having these extra files is helpful to downstreams, rather than setting a hard standard on what should be included. --- source/discussions/downstream-packaging.rst | 84 +++++++++++++++++++++ source/discussions/index.rst | 1 + 2 files changed, 85 insertions(+) create mode 100644 source/discussions/downstream-packaging.rst diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst new file mode 100644 index 000000000..1573bca06 --- /dev/null +++ b/source/discussions/downstream-packaging.rst @@ -0,0 +1,84 @@ +.. _downstream-packaging: + +======================================== +How to make downstream packaging easier? +======================================== + +:Page Status: Draft +:Last Reviewed: 2025-? + +While PyPI and the Python packaging tools such as :ref:`pip` are the primary +means of distributing your packages, they are often also made available as part +of other packaging ecosystems. These repackaging efforts are collectively called +*downstream* packaging (your own efforts are called *upstream* packaging), +and include such projects as Linux distributions, Conda, Homebrew and MacPorts. +They often aim to provide good support for use cases that cannot be handled +via Python packaging tools alone, such as good integration with non-Python +software. + +This discussion attempts to explain how downstream packaging is usually done, +and what challenges are downstream packagers facing. It ultimately aims to give +you some hints on how you can make downstream packaging easier. + +Please note that downstream builds include not only binary redistribution, +but also source builds done on user systems, in source-first distributions +such as Gentoo Linux. + + +.. _Provide complete source distributions: + +Provide complete source distributions +------------------------------------- +The vast majority of downstream packagers prefer to build packages from source, +rather than use the upstream-provided binary packages. This is also true +of pure Python packages that provide universal wheels. The reasons for using +source distributions may include: + +- being able to audit the source code of all packages + +- being able to run the test suite and build documentation + +- being able to easily apply patches, including backporting commits from your + repository and sending patches back to you + +- being able to build against a specific platform that is not covered + by upstream builds + +- being able to build against specific versions of system libraries + +- having a consistent build process across all Python packages + +Ideally, a source distribution archive should include all the files necessary +to build the package itself, run its test suite, build and install its +documentation, and any other files that may be useful to end users, such +as shell completions, editor support files, and so on. + +Some projects are concerned about increasing the size of source distribution, +or do not wish Python packaging tools to fall back to source distributions +automatically. In these cases, a good compromise may be to publish a separate +source archive for downstream use, for example by attaching it to a GitHub +release. + +While it is usually possible to build packages from a git repository, there are +a few important reasons to provide a static archive file instead: + +- Fetching a single file is often more efficient, more reliable and better + supported than e.g. using a git clone. This can help users with a shoddy + Internet connection. + +- Downstreams often use checksums to verify the authenticity of source files + on subsequent builds, which require that they remain bitwise identical over + time. For example, automatically generated git archives do not guarantee + that. + +- Archive files can be mirrored, reducing both upstream and downstream + bandwidth use. The actual builds can afterwards be performed in firewalled + or offline environments, that can only access source files provided + by the local mirror or redistributed earlier. + +A good idea is to use a release workflow that starts by building a source +distribution, and then performs all the remaining release steps (such as +running tests and building wheels) from the unpacked source distribution. This +ensures that the source distribution is actually tested, and reduces the risk +that users installing from it will hit build failures or install an incomplete +package. diff --git a/source/discussions/index.rst b/source/discussions/index.rst index 1f5ff1f2b..b1b84f97a 100644 --- a/source/discussions/index.rst +++ b/source/discussions/index.rst @@ -17,3 +17,4 @@ specific topic. If you're just trying to get stuff done, see src-layout-vs-flat-layout setup-py-deprecated single-source-version + downstream-packaging From b4cb7d2d6bed24af2f6831d07fb7d8fb5edf3f12 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Mon, 27 Jan 2025 20:50:17 +0100 Subject: [PATCH 02/31] Add a section on Internet access --- source/discussions/downstream-packaging.rst | 39 +++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index 1573bca06..f40fafac8 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -82,3 +82,42 @@ running tests and building wheels) from the unpacked source distribution. This ensures that the source distribution is actually tested, and reduces the risk that users installing from it will hit build failures or install an incomplete package. + + +.. _Do not use the Internet during the build process: + +Do not use the Internet during the build process +------------------------------------------------ +Downstream builds are frequently done in sandboxed environments that cannot +access the Internet. Therefore, it is important that your source distribution +includes all the files needed for the package to build or allows provisioning +them externally, and can build successfully without Internet access. + +Ideally, it should not even attempt to access the Internet at all, unless +explicitly requested to. If that is not possible to achieve, the next best +thing is to provide an opt-out switch to disable all Internet access, and fail +if some of the required files are missing instead of trying to fetch them. This +could be done e.g. by checking whether a ``NO_NETWORK`` environment variable is +to a non-empty value. Please also remember that if you are fetching remote +resources, you should verify their authenticity, e.g. against a checksum, to +protect against the file being substituted by a malicious party. + +Even if downloads are properly authenticated, using the Internet is discouraged +for a number of reasons: + +- The Internet connection may be unstable (e.g. poor reception) or suffer from + temporary problems that could cause the downloads to fail or hang. + +- The remote resources may become temporarily or even permanently unavailable, + making the build no longer possible. This is especially problematic when + someone needs to build an old package version. + +- Accessing remote servers poses a privacy issue and a potential security issue, + as it exposes information about the system building the package. + +- The user may be using a service with a limited data plan, in which + uncontrolled Internet access may result in additional charges or other + inconveniences. + +Since downstreams frequently also run tests and build documentation, the above +should ideally extend to these processes as well. From df7081cea024faf99a4c1e95f457e42cf8f8ecb6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Mon, 27 Jan 2025 20:51:23 +0100 Subject: [PATCH 03/31] Also suggest the option of splitting test data into separate archive --- source/discussions/downstream-packaging.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index f40fafac8..f8f4d9d42 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -57,7 +57,8 @@ Some projects are concerned about increasing the size of source distribution, or do not wish Python packaging tools to fall back to source distributions automatically. In these cases, a good compromise may be to publish a separate source archive for downstream use, for example by attaching it to a GitHub -release. +release. Alternatively, large files, such as test data, can be split into +separate archives. While it is usually possible to build packages from a git repository, there are a few important reasons to provide a static archive file instead: From cff76df08c13f83b5a8bb4d25192e957c516d108 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Tue, 28 Jan 2025 15:14:02 +0100 Subject: [PATCH 04/31] Add a section on system dependencies --- source/discussions/downstream-packaging.rst | 70 +++++++++++++++++++++ 1 file changed, 70 insertions(+) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index f8f4d9d42..50e00ac91 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -122,3 +122,73 @@ for a number of reasons: Since downstreams frequently also run tests and build documentation, the above should ideally extend to these processes as well. + + +.. _Support building against system dependencies: + +Support building against system dependencies +-------------------------------------------- +Some Python projects have non-Python dependencies, such as libraries written +in C or C++. Trying to use the system versions of these dependencies +in upstream packaging may cause a number of problems for end users: + +- The published wheels require a binary-compatible version of the used library + to be present on the user's system. If the library is missing or installed + in incompatible version, the Python package may fail with errors that + are not clear to inexperienced users, or even misbehave at runtime. + +- Building from source distribution requires a source-compatible version + of the dependency to be present, along with its development headers and other + auxiliary files that some systems package separately from the library itself. + +- Even for an experienced user, installing a compatible dependency version + may be very hard. For example, the used Linux distribution may not provide + the required version, or some other package may require an incompatible + version. + +- The linkage between the Python package and its system dependency is not + recorded by the packaging system. The next system update may upgrade + the library to a newer version that breaks binary compatibility with + the Python package, and requires user intervention to fix. + +For these reasons, you may reasonable to decide to either link statically +to your dependencies, or to provide a local copies in the installed package. +You may also vendor the dependency in your source distribution. Sometimes +these dependencies are also repackaged on PyPI, and can be installed +like a regular Python packages. + +However, none of these issues apply to downstream packaging, and downstreams +have good reasons to prefer dynamically linking to system dependencies. +In particular: + +- Static linking and vendoring obscures the use of external dependencies, + making source auditing harder. + +- Dynamic linking makes it possible to easily and quickly replace the used + libraries, which can be particularly important when they turn out to + be vulnerable or buggy. + +- Using system dependencies makes the package benefit from downstream + customization that can improve the user experience on a particular platform, + without the downstream maintainers having to consistently patch + the dependencies vendored in different packages. This can include + compatibility improvements and security hardening. + +- Static linking and vendoring could result in multiple different versions + of the same library being loaded in the same process (e.g. when you use two + Python packages that link to different versions of the same library). + This can cause no problems, but it could also lead to anything from subtle + bugs to catastrophic failures. + +- Last but not least, static linking and vendoring results in duplication, + and may increase the use of both the disk space and memory. + +A good compromise between the needs of both parties is to provide a switch +between using vendored and system dependencies. Ideally, if the package has +multiple vendored dependencies, it should provide both individual switches +for each dependency, and a general switch, for example using +a ``USE_SYSTEM_DEPS`` environment variable to control the default. If switched +on, and a particular dependency is either missing or incompatible, the build +should fail with an explanatory message, giving the packager an explicit +indication of the problem and a chance to consciously decide on the preferred +course of action. From be242a80635933692d238bb7e74c3ea962b480df Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Tue, 28 Jan 2025 16:52:12 +0100 Subject: [PATCH 05/31] Add a section on downstream testing --- source/discussions/downstream-packaging.rst | 79 +++++++++++++++++++++ 1 file changed, 79 insertions(+) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index 50e00ac91..7d09bdb73 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -192,3 +192,82 @@ on, and a particular dependency is either missing or incompatible, the build should fail with an explanatory message, giving the packager an explicit indication of the problem and a chance to consciously decide on the preferred course of action. + + +.. _Support downstream testing: + +Support downstream testing +-------------------------- +A variety of downstream projects run some degree of testing on the packaged +Python projects. Depending on the particular case, this can range from minimal +smoke testing to comprehensive runs of the complete test suite. There can +be various reasons for doing this, for example: + +- Verifying that the downstream packaging did not introduce any bugs. + +- Testing on a platform that is not covered by upstream testing. + +- Finding subtle bugs that can only be reproduced on a particular hardware, + system package versions, and so on. + +- Testing the released package against newer dependency version than the ones + present during upstream release testing. + +- Testing the package in an environment closely resembling the production + setup. This can detect issues caused by nontrivial interactions between + different installed packages, including packages that are not dependencies + of your package, but nevertheless can cause issues. + +- Testing the released package against newer Python versions (including newer + point releases), or less tested Python implementations such as PyPy. + +Admittedly, sometimes downstream testing may yield false positives or +inconvenience you about scenarios that you are not interested in supporting. +However, perhaps even more often it does provide early notice of problems, +or find nontrivial bugs that would otherwise cause issues for your users +in production. And believe me, the majority of downstream packagers are doing +their best to double-check their results, and help you triage and fix the bugs +that they report. + +There is a number of things that you can do to help us test your package +better. Some of them were already mentioned in this discussion. Some examples +are: + +- Include the test files and fixtures in the source distribution, or make it + possible to easily download them separately. + +- Do not write to the package during testing. Downstream test setups sometimes + run tests on top of the installed package, and test-time modifications can + end up being part of the production package! + +- Make the test suite work offline. Mock network interactions, using packages + such as responses_ or vcrpy_. If that is not possible, make it possible + to easily disable the tests using Internet access, e.g. via a pytest marker. + Use pytest-socket_ to verify that your tests work offline. + +- Make your tests work without a specialized setup, or perform the necessary + setup as part of test fixtures. Do not ever assume that you can connect + to system services such as databases — in an extreme case, you could crash + a production service! + +- Do not assume that the test suite will be run with ``-Werror``. Downstreams + often need to disable that, as it causes false positives, e.g. due to newer + dependency versions. Assert for warnings using ``pytest.warns()`` rather + than ``pytest.raises()``! + +- Aim to make your test suite reliable. Avoid flaky tests. Avoid depending + on specific platform details, don't rely on exact results of floating-point + computation, or timing of operations, and so on. Fuzzing has its advantages, + but you want to have static test cases for completeness as well. + +- Split tests by their purpose, and make it easy to skip categories that are + irrelevant or problematic. Since the primary purpose of downstream testing is + to ensure that the package itself works, we generally are not interested + in e.g. checking code coverage, code formatting, typing or running + benchmarks. These tests can fail as dependencies are upgraded or the system + is under load, without actually affecting the package itself. + + +.. _responses: https://pypi.org/project/responses/ +.. _vcrpy: https://pypi.org/project/vcrpy/ +.. _pytest-socket: https://pypi.org/project/pytest-socket/ From 9f9af53be8e3fe63af9449481b588f6642e71627 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Wed, 29 Jan 2025 20:49:45 +0100 Subject: [PATCH 06/31] Expand on downstream testing, and add emphasis for readability --- source/discussions/downstream-packaging.rst | 60 +++++++++++++++------ 1 file changed, 43 insertions(+), 17 deletions(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index 7d09bdb73..fe909b9cd 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -233,41 +233,67 @@ There is a number of things that you can do to help us test your package better. Some of them were already mentioned in this discussion. Some examples are: -- Include the test files and fixtures in the source distribution, or make it +- **Include the test files and fixtures in the source distribution**, or make it possible to easily download them separately. -- Do not write to the package during testing. Downstream test setups sometimes - run tests on top of the installed package, and test-time modifications can - end up being part of the production package! +- **Do not write to the package directories during testing.** Downstream test + setups sometimes run tests on top of the installed package, and modifications + performed during testing and temporary test files may end up being part + of the installed package! -- Make the test suite work offline. Mock network interactions, using packages - such as responses_ or vcrpy_. If that is not possible, make it possible - to easily disable the tests using Internet access, e.g. via a pytest marker. - Use pytest-socket_ to verify that your tests work offline. +- **Make the test suite work offline.** Mock network interactions, using + packages such as responses_ or vcrpy_. If that is not possible, make it + possible to easily disable the tests using Internet access, e.g. via a pytest + marker. Use pytest-socket_ to verify that your tests work offline. This + often makes your own test workflows faster and more reliable as well. -- Make your tests work without a specialized setup, or perform the necessary +- **Make your tests work without a specialized setup**, or perform the necessary setup as part of test fixtures. Do not ever assume that you can connect to system services such as databases — in an extreme case, you could crash a production service! -- Do not assume that the test suite will be run with ``-Werror``. Downstreams +- **If your package has optional dependencies, make their tests optional as + well.** Either skip them if the needed packages are not installed, or add + markers to make deselecting easy. + +- More generally, **add markers to tests with special requirements**. These can + include e.g. significant space usage, significant memory usage, long runtime, + incompatibility with parallel testing. + +- **Do not assume that the test suite will be run with -Werror.** Downstreams often need to disable that, as it causes false positives, e.g. due to newer dependency versions. Assert for warnings using ``pytest.warns()`` rather than ``pytest.raises()``! -- Aim to make your test suite reliable. Avoid flaky tests. Avoid depending - on specific platform details, don't rely on exact results of floating-point - computation, or timing of operations, and so on. Fuzzing has its advantages, - but you want to have static test cases for completeness as well. +- **Aim to make your test suite reliable and reproducible.** Avoid flaky tests. + Avoid depending on specific platform details, don't rely on exact results + of floating-point computation, or timing of operations, and so on. Fuzzing + has its advantages, but you want to have static test cases for completeness + as well. -- Split tests by their purpose, and make it easy to skip categories that are - irrelevant or problematic. Since the primary purpose of downstream testing is - to ensure that the package itself works, we generally are not interested +- **Split tests by their purpose, and make it easy to skip categories that are + irrelevant or problematic.** Since the primary purpose of downstream testing + is to ensure that the package itself works, we generally are not interested in e.g. checking code coverage, code formatting, typing or running benchmarks. These tests can fail as dependencies are upgraded or the system is under load, without actually affecting the package itself. +- If your test suite takes significant time to run, **support testing + in parallel.** Downstreams often maintain a large number of packages, + and testing them all takes a lot of time. Using pytest-xdist_ can help them + avoid bottlenecks. + +- Ideally, **support running your test suite via PyTest**. PyTest_ has many + command-line arguments that are truly helpful to downstreams, such as + the ability to conveniently deselect tests, rerun flaky tests + (via pytest-rerunfailures_), add a timeout to prevent tests from hanging + (via pytest-timeout_) or run tests in parallel (via pytest-xdist_). + .. _responses: https://pypi.org/project/responses/ .. _vcrpy: https://pypi.org/project/vcrpy/ .. _pytest-socket: https://pypi.org/project/pytest-socket/ +.. _pytest-xdist: https://pypi.org/project/pytest-xdist/ +.. _pytest: https://pytest.org/ +.. _pytest-rerunfailures: https://pypi.org/project/pytest-rerunfailures/ +.. _pytest-timeout: https://pypi.org/project/pytest-timeout/ From 00f39e5a434e7dd17ba9743072c9c33222b6c378 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Wed, 29 Jan 2025 21:19:40 +0100 Subject: [PATCH 07/31] Reorganize into why/how sections, and add emphasis Hopefully, this will make it easier for people who aren't interested in reading all the rationale to find the important details -- and for people who read it once to quickly find the point they need later. --- source/discussions/downstream-packaging.rst | 212 +++++++++++--------- 1 file changed, 117 insertions(+), 95 deletions(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index fe909b9cd..71d02e7d0 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -29,97 +29,108 @@ such as Gentoo Linux. Provide complete source distributions ------------------------------------- +Why? +~~~~ The vast majority of downstream packagers prefer to build packages from source, rather than use the upstream-provided binary packages. This is also true of pure Python packages that provide universal wheels. The reasons for using source distributions may include: -- being able to audit the source code of all packages +- being able to **audit the source code** of all packages -- being able to run the test suite and build documentation +- being able to **run the test suite and build documentation** -- being able to easily apply patches, including backporting commits from your - repository and sending patches back to you +- being able to **easily apply patches**, including backporting commits + from your repository and sending patches back to you -- being able to build against a specific platform that is not covered +- being able to **build on a specific platform** that is not covered by upstream builds -- being able to build against specific versions of system libraries +- being able to **build against specific versions of system libraries** - having a consistent build process across all Python packages -Ideally, a source distribution archive should include all the files necessary -to build the package itself, run its test suite, build and install its -documentation, and any other files that may be useful to end users, such -as shell completions, editor support files, and so on. - -Some projects are concerned about increasing the size of source distribution, -or do not wish Python packaging tools to fall back to source distributions -automatically. In these cases, a good compromise may be to publish a separate -source archive for downstream use, for example by attaching it to a GitHub -release. Alternatively, large files, such as test data, can be split into -separate archives. - While it is usually possible to build packages from a git repository, there are a few important reasons to provide a static archive file instead: -- Fetching a single file is often more efficient, more reliable and better - supported than e.g. using a git clone. This can help users with a shoddy +- Fetching a single file is often **more efficient, more reliable and better + supported** than e.g. using a git clone. This can help users with a shoddy Internet connection. -- Downstreams often use checksums to verify the authenticity of source files +- Downstreams often **use checksums to verify the authenticity** of source files on subsequent builds, which require that they remain bitwise identical over time. For example, automatically generated git archives do not guarantee that. -- Archive files can be mirrored, reducing both upstream and downstream +- Archive files can be **mirrored**, reducing both upstream and downstream bandwidth use. The actual builds can afterwards be performed in firewalled or offline environments, that can only access source files provided by the local mirror or redistributed earlier. -A good idea is to use a release workflow that starts by building a source -distribution, and then performs all the remaining release steps (such as -running tests and building wheels) from the unpacked source distribution. This -ensures that the source distribution is actually tested, and reduces the risk -that users installing from it will hit build failures or install an incomplete -package. +How? +~~~~ +Ideally, **a source distribution archive should include all the files** +necessary to build the package itself, run its test suite, build and install +its documentation, and any other files that may be useful to end users, such as +shell completions, editor support files, and so on. + +Some projects are concerned about increasing the size of source distribution, +or do not wish Python packaging tools to fall back to source distributions +automatically. In these cases, a good compromise may be to publish a separate +source archive for downstream use, for example by attaching it to a GitHub +release. Alternatively, large files, such as test data, can be split into +separate archives. + +A good idea is to **use your source distribution in the release workflow**. +That is, build it first, then unpack it and perform all the remaining steps +using the unpacked distribution rather than the git repostiry — run tests, +build documentation, build wheels. This ensures that it is well-tested, +and reduces the risk that some users would hit build failures or install +an incomplete package. .. _Do not use the Internet during the build process: Do not use the Internet during the build process ------------------------------------------------ -Downstream builds are frequently done in sandboxed environments that cannot -access the Internet. Therefore, it is important that your source distribution -includes all the files needed for the package to build or allows provisioning -them externally, and can build successfully without Internet access. - -Ideally, it should not even attempt to access the Internet at all, unless -explicitly requested to. If that is not possible to achieve, the next best -thing is to provide an opt-out switch to disable all Internet access, and fail -if some of the required files are missing instead of trying to fetch them. This -could be done e.g. by checking whether a ``NO_NETWORK`` environment variable is -to a non-empty value. Please also remember that if you are fetching remote -resources, you should verify their authenticity, e.g. against a checksum, to -protect against the file being substituted by a malicious party. - -Even if downloads are properly authenticated, using the Internet is discouraged -for a number of reasons: +Why? +~~~~ +Downstream builds are frequently done in sandboxed environments that **cannot +access the Internet**. Even if this is not the case, and assuming that you took +sufficient care to **properly authenticate downloads**, using the Internet +is discouraged for a number of reasons: -- The Internet connection may be unstable (e.g. poor reception) or suffer from - temporary problems that could cause the downloads to fail or hang. +- The Internet **connection may be unstable** (e.g. due to poor reception) + or suffer from temporary problems that could cause the process to fail + or hang. -- The remote resources may become temporarily or even permanently unavailable, - making the build no longer possible. This is especially problematic when - someone needs to build an old package version. +- The remote resources may **become temporarily or even permanently + unavailable**, making the build no longer possible. This is especially + problematic when someone needs to build an old package version. -- Accessing remote servers poses a privacy issue and a potential security issue, - as it exposes information about the system building the package. +- Accessing remote servers poses a **privacy** issue and a potential + **security** issue, as it exposes information about the system building + the package. - The user may be using a service with a limited data plan, in which - uncontrolled Internet access may result in additional charges or other + uncontrolled Internet access may result in **additional charges** or other inconveniences. +How? +~~~~ +Your source distribution should either **include all the files needed +for the package to build**, or allow provisioning them externally. Ideally, +it should not even attempt to access the Internet at all, unless explicitly +requested to. If that is not possible to achieve, the next best thing +is to **provide an opt-out switch to disable all Internet access**. + +When such a switch is used, the build process should fail if some +of the required files are missing, rather than try to fetch them automatically. +This could be done e.g. by checking whether a ``NO_NETWORK`` environment +variable is set to a non-empty value. Please also remember that if you are +fetching remote resources, you must **verify their authenticity**, e.g. against +a checksum, to protect against the file being substituted by a malicious party. + Since downstreams frequently also run tests and build documentation, the above should ideally extend to these processes as well. @@ -128,107 +139,118 @@ should ideally extend to these processes as well. Support building against system dependencies -------------------------------------------- +Why? +~~~~ Some Python projects have non-Python dependencies, such as libraries written in C or C++. Trying to use the system versions of these dependencies in upstream packaging may cause a number of problems for end users: -- The published wheels require a binary-compatible version of the used library - to be present on the user's system. If the library is missing or installed - in incompatible version, the Python package may fail with errors that - are not clear to inexperienced users, or even misbehave at runtime. +- The published wheels **require a binary-compatible version of the used + library** to be present on the user's system. If the library is missing + or installed in incompatible version, the Python package may fail with errors + that are not clear to inexperienced users, or even misbehave at runtime. -- Building from source distribution requires a source-compatible version - of the dependency to be present, along with its development headers and other - auxiliary files that some systems package separately from the library itself. +- Building from source distribution **requires a source-compatible version + of the dependency** to be present, along with its development headers + and other auxiliary files that some systems package separately + from the library itself. - Even for an experienced user, installing a compatible dependency version may be very hard. For example, the used Linux distribution may not provide - the required version, or some other package may require an incompatible - version. + the required version, or some **other package may require an incompatible + version**. - The linkage between the Python package and its system dependency is not - recorded by the packaging system. The next system update may upgrade - the library to a newer version that breaks binary compatibility with + recorded by the packaging system. The next system update may **upgrade + the library to a newer version that breaks binary compatibility** with the Python package, and requires user intervention to fix. -For these reasons, you may reasonable to decide to either link statically +For these reasons, you may reasonable to decide to either **link statically** to your dependencies, or to provide a local copies in the installed package. -You may also vendor the dependency in your source distribution. Sometimes +You may also **vendor the dependency** in your source distribution. Sometimes these dependencies are also repackaged on PyPI, and can be installed like a regular Python packages. However, none of these issues apply to downstream packaging, and downstreams -have good reasons to prefer dynamically linking to system dependencies. +have good reasons to prefer **dynamically linking to system dependencies**. In particular: - Static linking and vendoring obscures the use of external dependencies, - making source auditing harder. + **making source auditing harder**. -- Dynamic linking makes it possible to easily and quickly replace the used - libraries, which can be particularly important when they turn out to +- Dynamic linking makes it possible to easily and **quickly replace the used + libraries**, which can be particularly important when they turn out to be vulnerable or buggy. -- Using system dependencies makes the package benefit from downstream - customization that can improve the user experience on a particular platform, +- Using system dependencies makes the package benefit from **downstream + customization** that can improve the user experience on a particular platform, without the downstream maintainers having to consistently patch the dependencies vendored in different packages. This can include - compatibility improvements and security hardening. + **compatibility improvements and security hardening**. -- Static linking and vendoring could result in multiple different versions - of the same library being loaded in the same process (e.g. when you use two +- Static linking and vendoring could result in **multiple different versions + of the same library being loaded in the same process** (e.g. when you use two Python packages that link to different versions of the same library). This can cause no problems, but it could also lead to anything from subtle bugs to catastrophic failures. - Last but not least, static linking and vendoring results in duplication, - and may increase the use of both the disk space and memory. + and may increase the **use of both the disk space and memory**. -A good compromise between the needs of both parties is to provide a switch -between using vendored and system dependencies. Ideally, if the package has +How? +~~~~ +A good compromise between the needs of both parties is to **provide a switch +between using vendored and system dependencies**. Ideally, if the package has multiple vendored dependencies, it should provide both individual switches -for each dependency, and a general switch, for example using -a ``USE_SYSTEM_DEPS`` environment variable to control the default. If switched -on, and a particular dependency is either missing or incompatible, the build -should fail with an explanatory message, giving the packager an explicit -indication of the problem and a chance to consciously decide on the preferred -course of action. +for each dependency, and a general switch to control the default for them, +e.g. via a ``USE_SYSTEM_DEPS`` environment variable. + +If the user requests using system dependencies, and **a particular dependency +is either missing or incompatible, the build should fail** with an explanatory +message rather than fall back to a vendored version. This gives the packager +the opportunity to notice their mistake and a chance to consciously decide +how to solve it. .. _Support downstream testing: Support downstream testing -------------------------- +Why? +~~~~ A variety of downstream projects run some degree of testing on the packaged Python projects. Depending on the particular case, this can range from minimal smoke testing to comprehensive runs of the complete test suite. There can be various reasons for doing this, for example: -- Verifying that the downstream packaging did not introduce any bugs. +- Verifying that the downstream **packaging did not introduce any bugs**. -- Testing on a platform that is not covered by upstream testing. +- Testing on **additional platforms** that are not covered by upstream testing. -- Finding subtle bugs that can only be reproduced on a particular hardware, - system package versions, and so on. +- Finding subtle bugs that can only be reproduced on a **particular hardware, + system package versions**, and so on. -- Testing the released package against newer dependency version than the ones - present during upstream release testing. +- Testing the released package against **newer dependency versions** than + the ones present during upstream release testing. -- Testing the package in an environment closely resembling the production - setup. This can detect issues caused by nontrivial interactions between +- Testing the package in an environment closely resembling **the production + setup**. This can detect issues caused by nontrivial interactions between different installed packages, including packages that are not dependencies of your package, but nevertheless can cause issues. -- Testing the released package against newer Python versions (including newer - point releases), or less tested Python implementations such as PyPy. +- Testing the released package against **newer Python versions** (including + newer point releases), or less tested Python implementations such as PyPy. Admittedly, sometimes downstream testing may yield false positives or inconvenience you about scenarios that you are not interested in supporting. However, perhaps even more often it does provide early notice of problems, or find nontrivial bugs that would otherwise cause issues for your users -in production. And believe me, the majority of downstream packagers are doing +in production. And believe me, the majority of **downstream packagers are doing their best to double-check their results, and help you triage and fix the bugs -that they report. +that they report**. +How? +~~~~ There is a number of things that you can do to help us test your package better. Some of them were already mentioned in this discussion. Some examples are: From 08c70e87a763d40832d177be70968e7280af821d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Wed, 29 Jan 2025 21:32:26 +0100 Subject: [PATCH 08/31] Elaborate a bit more on why it's good to help downstreams --- source/discussions/downstream-packaging.rst | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index 71d02e7d0..d3f33668f 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -13,12 +13,22 @@ of other packaging ecosystems. These repackaging efforts are collectively called *downstream* packaging (your own efforts are called *upstream* packaging), and include such projects as Linux distributions, Conda, Homebrew and MacPorts. They often aim to provide good support for use cases that cannot be handled -via Python packaging tools alone, such as good integration with non-Python -software. +via Python packaging tools alone, such as **good integration with non-Python +software**. This discussion attempts to explain how downstream packaging is usually done, and what challenges are downstream packagers facing. It ultimately aims to give -you some hints on how you can make downstream packaging easier. +you some hints on how you can **make downstream packaging easier**. + +Establishing a good relationship between software maintainers and downstream +packagers can bring mutual benefits. Downstreams are often willing to **share +their experience, time and hardware** to improve your package. They are +sometimes in a better position to see **the bigger picture**, and to provide +you with **information about other packages** that would otherwise require you +to put significant effort to obtain. Packagers often can **find bugs** before +your users hit them on production, **provide bug reports of good quality** +and **supply patches** whenever they can. For example, they are often +in the vanguard when **a new Python version** comes out. Please note that downstream builds include not only binary redistribution, but also source builds done on user systems, in source-first distributions From 24f552e00ca448d4f6389d3422d8222bbd90224f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Wed, 29 Jan 2025 21:36:12 +0100 Subject: [PATCH 09/31] Correct "pytest" capitalization --- source/discussions/downstream-packaging.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index d3f33668f..6ea05a348 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -315,7 +315,7 @@ are: and testing them all takes a lot of time. Using pytest-xdist_ can help them avoid bottlenecks. -- Ideally, **support running your test suite via PyTest**. PyTest_ has many +- Ideally, **support running your test suite via pytest**. pytest_ has many command-line arguments that are truly helpful to downstreams, such as the ability to conveniently deselect tests, rerun flaky tests (via pytest-rerunfailures_), add a timeout to prevent tests from hanging From 24b03467f59ba59e154b75fef77a86d250bd03d0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Thu, 30 Jan 2025 13:10:07 +0000 Subject: [PATCH 10/31] Apply suggestions from code review Co-authored-by: Alyssa Coghlan --- source/discussions/downstream-packaging.rst | 242 +++++++++++--------- 1 file changed, 137 insertions(+), 105 deletions(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index 6ea05a348..25740ba29 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -1,84 +1,96 @@ .. _downstream-packaging: -======================================== -How to make downstream packaging easier? -======================================== +================================= +Simplifying downstream packaging +================================= :Page Status: Draft :Last Reviewed: 2025-? While PyPI and the Python packaging tools such as :ref:`pip` are the primary -means of distributing your packages, they are often also made available as part +means of distributing Python packages, they are also often made available as part of other packaging ecosystems. These repackaging efforts are collectively called *downstream* packaging (your own efforts are called *upstream* packaging), and include such projects as Linux distributions, Conda, Homebrew and MacPorts. -They often aim to provide good support for use cases that cannot be handled -via Python packaging tools alone, such as **good integration with non-Python -software**. +They generally aim to provide improved support for use cases that cannot be handled +via Python packaging tools alone, such as native integration with a specific operating +system, or assured compatibility with specific versions of non-Python software. This discussion attempts to explain how downstream packaging is usually done, -and what challenges are downstream packagers facing. It ultimately aims to give -you some hints on how you can **make downstream packaging easier**. +and what additional challenges downstream packagers typically face. It aims +to provide some optional guidelines that project maintainers may choose to +follow which help make downstream packaging *significantly* easier +(without imposing any major maintenance hassles on the upstream project). Establishing a good relationship between software maintainers and downstream -packagers can bring mutual benefits. Downstreams are often willing to **share -their experience, time and hardware** to improve your package. They are -sometimes in a better position to see **the bigger picture**, and to provide -you with **information about other packages** that would otherwise require you -to put significant effort to obtain. Packagers often can **find bugs** before -your users hit them on production, **provide bug reports of good quality** -and **supply patches** whenever they can. For example, they are often -in the vanguard when **a new Python version** comes out. +packagers can bring mutual benefits. Downstreams are often willing to share +their experience, time and hardware to improve your package. They are +sometimes in a better position to see how your package is used in practice, +and to provide information about its relationships with other packages that +would otherwise require significant effort to obtain. +Packagers can often find bugs before your users hit them in production, +provide bug reports of good quality, and supply patches whenever they can. +For example, they are regularly active in ensuring the packages they redistribute +are updated for any compatibility issues that arise when a new Python version +is released. Please note that downstream builds include not only binary redistribution, -but also source builds done on user systems, in source-first distributions -such as Gentoo Linux. +but also source builds done on user systems (in source-first distributions +such as Gentoo Linux, for example). -.. _Provide complete source distributions: +.. _provide-complete-source-distributions: Provide complete source distributions ------------------------------------- + Why? ~~~~ + The vast majority of downstream packagers prefer to build packages from source, rather than use the upstream-provided binary packages. This is also true of pure Python packages that provide universal wheels. The reasons for using source distributions may include: -- being able to **audit the source code** of all packages +- being able to audit the source code of all packages -- being able to **run the test suite and build documentation** +- being able to run the test suite and build documentation -- being able to **easily apply patches**, including backporting commits - from your repository and sending patches back to you +- being able to easily apply patches, including backporting commits + from the project's repository and sending patches back to the project -- being able to **build on a specific platform** that is not covered +- being able to build on a specific platform that is not covered by upstream builds -- being able to **build against specific versions of system libraries** +- being able to build against specific versions of system libraries - having a consistent build process across all Python packages While it is usually possible to build packages from a git repository, there are a few important reasons to provide a static archive file instead: -- Fetching a single file is often **more efficient, more reliable and better - supported** than e.g. using a git clone. This can help users with a shoddy - Internet connection. +- Fetching a single file is often more efficient, more reliable and better + supported than e.g. using a git clone. This can help users with poor + Internet connectivity. - Downstreams often **use checksums to verify the authenticity** of source files on subsequent builds, which require that they remain bitwise identical over time. For example, automatically generated git archives do not guarantee that. -- Archive files can be **mirrored**, reducing both upstream and downstream +- Archive files can be mirrored, reducing both upstream and downstream bandwidth use. The actual builds can afterwards be performed in firewalled or offline environments, that can only access source files provided by the local mirror or redistributed earlier. +- Explicitly publishing archive files can ensure that any dependencies on version control + system metadata are resolved when creating the source archive. For example, automatically + generated git archives omit all of the commit tag information, potentially resulting in + incorrect version details in the resulting builds. + How? ~~~~ + Ideally, **a source distribution archive should include all the files** necessary to build the package itself, run its test suite, build and install its documentation, and any other files that may be useful to end users, such as @@ -99,23 +111,25 @@ and reduces the risk that some users would hit build failures or install an incomplete package. -.. _Do not use the Internet during the build process: +.. _no-internet-access-in-builds: Do not use the Internet during the build process ------------------------------------------------ + Why? ~~~~ -Downstream builds are frequently done in sandboxed environments that **cannot -access the Internet**. Even if this is not the case, and assuming that you took -sufficient care to **properly authenticate downloads**, using the Internet + +Downstream builds are frequently done in sandboxed environments that cannot +access the Internet. Even if this is not the case, and assuming that you took +sufficient care to properly authenticate downloads, using the Internet is discouraged for a number of reasons: -- The Internet **connection may be unstable** (e.g. due to poor reception) +- The Internet connection may be unstable (e.g. due to poor reception) or suffer from temporary problems that could cause the process to fail or hang. -- The remote resources may **become temporarily or even permanently - unavailable**, making the build no longer possible. This is especially +- The remote resources may become temporarily or even permanently + unavailable, making the build no longer possible. This is especially problematic when someone needs to build an old package version. - Accessing remote servers poses a **privacy** issue and a potential @@ -123,16 +137,17 @@ is discouraged for a number of reasons: the package. - The user may be using a service with a limited data plan, in which - uncontrolled Internet access may result in **additional charges** or other + uncontrolled Internet access may result in additional charges or other inconveniences. How? ~~~~ -Your source distribution should either **include all the files needed -for the package to build**, or allow provisioning them externally. Ideally, + +Your source distribution should either include all the files needed +for the package to build, or allow provisioning them externally. Ideally, it should not even attempt to access the Internet at all, unless explicitly requested to. If that is not possible to achieve, the next best thing -is to **provide an opt-out switch to disable all Internet access**. +is to provide an opt-out switch to disable all Internet access. When such a switch is used, the build process should fail if some of the required files are missing, rather than try to fetch them automatically. @@ -145,114 +160,128 @@ Since downstreams frequently also run tests and build documentation, the above should ideally extend to these processes as well. -.. _Support building against system dependencies: +.. _support-system-dependencies-in-builds: Support building against system dependencies -------------------------------------------- + Why? ~~~~ + Some Python projects have non-Python dependencies, such as libraries written in C or C++. Trying to use the system versions of these dependencies in upstream packaging may cause a number of problems for end users: -- The published wheels **require a binary-compatible version of the used - library** to be present on the user's system. If the library is missing - or installed in incompatible version, the Python package may fail with errors +- The published wheels require a binary-compatible version of the used + library to be present on the user's system. If the library is missing + or an incompatible version is installed, the Python package may fail with errors that are not clear to inexperienced users, or even misbehave at runtime. -- Building from source distribution **requires a source-compatible version - of the dependency** to be present, along with its development headers +- Building from a source distribution requires a source-compatible version + of the dependency to be present, along with its development headers and other auxiliary files that some systems package separately from the library itself. - Even for an experienced user, installing a compatible dependency version may be very hard. For example, the used Linux distribution may not provide - the required version, or some **other package may require an incompatible - version**. + the required version, or some other package may require an incompatible + version. - The linkage between the Python package and its system dependency is not - recorded by the packaging system. The next system update may **upgrade - the library to a newer version that breaks binary compatibility** with + recorded by the packaging system. The next system update may upgrade + the library to a newer version that breaks binary compatibility with the Python package, and requires user intervention to fix. -For these reasons, you may reasonable to decide to either **link statically** -to your dependencies, or to provide a local copies in the installed package. -You may also **vendor the dependency** in your source distribution. Sometimes -these dependencies are also repackaged on PyPI, and can be installed -like a regular Python packages. +For these reasons, you may reasonably decide to either statically link +your dependencies, or to provide local copies in the installed package. +You may also vendor the dependency in your source distribution. Sometimes +these dependencies are also repackaged on PyPI, and can be declared as +project dependencies like any other Python package. However, none of these issues apply to downstream packaging, and downstreams -have good reasons to prefer **dynamically linking to system dependencies**. +have good reasons to prefer dynamically linking to system dependencies. In particular: +- in many cases, reliably sharing dynamic dependencies between components is a large part + of the *purpose* of a downstream packaging ecosystem. Helping to support that makes it + easier for users of those systems to access upstream projects in their preferred format. + - Static linking and vendoring obscures the use of external dependencies, - **making source auditing harder**. + making source auditing harder. -- Dynamic linking makes it possible to easily and **quickly replace the used - libraries**, which can be particularly important when they turn out to - be vulnerable or buggy. +- Dynamic linking makes it possible to quickly and systematically replace the used + libraries across an entire downstream packaging ecosystem, which can be particularly + important when they turn out to contain a security vulnerability or critical bug. -- Using system dependencies makes the package benefit from **downstream - customization** that can improve the user experience on a particular platform, +- Using system dependencies makes the package benefit from downstream + customization that can improve the user experience on a particular platform, without the downstream maintainers having to consistently patch the dependencies vendored in different packages. This can include - **compatibility improvements and security hardening**. + compatibility improvements and security hardening. -- Static linking and vendoring could result in **multiple different versions - of the same library being loaded in the same process** (e.g. when you use two - Python packages that link to different versions of the same library). - This can cause no problems, but it could also lead to anything from subtle - bugs to catastrophic failures. +- Static linking and vendoring can result in multiple different versions of the + same library being loaded in the same process (for example, attempting to + import two Python packages that link to different versions of the same library). + This sometimes works without incident, but it can also lead to anything from library + loading errors, to subtle runtime bugs, to catastrophic system failures. - Last but not least, static linking and vendoring results in duplication, - and may increase the **use of both the disk space and memory**. + and may increase the use of both disk space and memory. How? ~~~~ -A good compromise between the needs of both parties is to **provide a switch -between using vendored and system dependencies**. Ideally, if the package has + +A good compromise between the needs of both parties is to provide a switch +between using vendored and system dependencies. Ideally, if the package has multiple vendored dependencies, it should provide both individual switches for each dependency, and a general switch to control the default for them, e.g. via a ``USE_SYSTEM_DEPS`` environment variable. -If the user requests using system dependencies, and **a particular dependency -is either missing or incompatible, the build should fail** with an explanatory +If the user requests using system dependencies, and a particular dependency +is either missing or incompatible, the build should fail with an explanatory message rather than fall back to a vendored version. This gives the packager the opportunity to notice their mistake and a chance to consciously decide how to solve it. +Note that it is reasonable for upstream projects to leave *testing* of building with +system dependencies to their downstream repackagers. The goal of these guidelines +is to facilitate more effective collaboration between upstream projects and downstream +repackagers, not to suggest upstream projects take on tasks that downstream repackagers +are better equipped to handle. -.. _Support downstream testing: +.. _support-downstream-testing: Support downstream testing -------------------------- + Why? ~~~~ + A variety of downstream projects run some degree of testing on the packaged Python projects. Depending on the particular case, this can range from minimal smoke testing to comprehensive runs of the complete test suite. There can be various reasons for doing this, for example: -- Verifying that the downstream **packaging did not introduce any bugs**. +- Verifying that the downstream packaging did not introduce any bugs. -- Testing on **additional platforms** that are not covered by upstream testing. +- Testing on additional platforms that are not covered by upstream testing. -- Finding subtle bugs that can only be reproduced on a **particular hardware, - system package versions**, and so on. +- Finding subtle bugs that can only be reproduced with particular hardware, + system package versions, and so on. -- Testing the released package against **newer dependency versions** than +- Testing the released package against newer (or older) dependency versions than the ones present during upstream release testing. -- Testing the package in an environment closely resembling **the production - setup**. This can detect issues caused by nontrivial interactions between +- Testing the package in an environment closely resembling the production + setup. This can detect issues caused by nontrivial interactions between different installed packages, including packages that are not dependencies of your package, but nevertheless can cause issues. -- Testing the released package against **newer Python versions** (including +- Testing the released package against newer Python versions (including newer point releases), or less tested Python implementations such as PyPy. -Admittedly, sometimes downstream testing may yield false positives or -inconvenience you about scenarios that you are not interested in supporting. +Admittedly, sometimes downstream testing may yield false positives or bug +reports about scenarios the upstream project is not interested in supporting. However, perhaps even more often it does provide early notice of problems, or find nontrivial bugs that would otherwise cause issues for your users in production. And believe me, the majority of **downstream packagers are doing @@ -261,61 +290,64 @@ that they report**. How? ~~~~ -There is a number of things that you can do to help us test your package -better. Some of them were already mentioned in this discussion. Some examples -are: -- **Include the test files and fixtures in the source distribution**, or make it +There are a number of things that upstream projects can do to help downstream +repackagers test their packages efficiently and effectively, including some of the suggestions +already mentioned above. These are typically improvements that make the test suite more +reliable and easier to use for everyone, not just downstream packagers. +Some specific suggestions are: + +- Include the test files and fixtures in the source distribution, or make it possible to easily download them separately. -- **Do not write to the package directories during testing.** Downstream test +- Do not write to the package directories during testing. Downstream test setups sometimes run tests on top of the installed package, and modifications performed during testing and temporary test files may end up being part of the installed package! -- **Make the test suite work offline.** Mock network interactions, using +- Make the test suite work offline. Mock network interactions, using packages such as responses_ or vcrpy_. If that is not possible, make it - possible to easily disable the tests using Internet access, e.g. via a pytest + possible to easily disable the tests using Internet access, e.g. via a pytest_ marker. Use pytest-socket_ to verify that your tests work offline. This often makes your own test workflows faster and more reliable as well. -- **Make your tests work without a specialized setup**, or perform the necessary +- Make your tests work without a specialized setup, or perform the necessary setup as part of test fixtures. Do not ever assume that you can connect to system services such as databases — in an extreme case, you could crash a production service! -- **If your package has optional dependencies, make their tests optional as - well.** Either skip them if the needed packages are not installed, or add +- If your package has optional dependencies, make their tests optional as + well. Either skip them if the needed packages are not installed, or add markers to make deselecting easy. -- More generally, **add markers to tests with special requirements**. These can +- More generally, add markers to tests with special requirements. These can include e.g. significant space usage, significant memory usage, long runtime, incompatibility with parallel testing. -- **Do not assume that the test suite will be run with -Werror.** Downstreams +- Do not assume that the test suite will be run with ``-Werror``. Downstreams often need to disable that, as it causes false positives, e.g. due to newer dependency versions. Assert for warnings using ``pytest.warns()`` rather than ``pytest.raises()``! -- **Aim to make your test suite reliable and reproducible.** Avoid flaky tests. +- Aim to make your test suite reliable and reproducible. Avoid flaky tests. Avoid depending on specific platform details, don't rely on exact results of floating-point computation, or timing of operations, and so on. Fuzzing has its advantages, but you want to have static test cases for completeness as well. -- **Split tests by their purpose, and make it easy to skip categories that are - irrelevant or problematic.** Since the primary purpose of downstream testing - is to ensure that the package itself works, we generally are not interested - in e.g. checking code coverage, code formatting, typing or running +- Split tests by their purpose, and make it easy to skip categories that are + irrelevant or problematic. Since the primary purpose of downstream testing + is to ensure that the package itself works, downstreams are not generally interested + in tasks such as checking code coverage, code formatting, typechecking or running benchmarks. These tests can fail as dependencies are upgraded or the system is under load, without actually affecting the package itself. -- If your test suite takes significant time to run, **support testing - in parallel.** Downstreams often maintain a large number of packages, +- If your test suite takes significant time to run, support testing + in parallel. Downstreams often maintain a large number of packages, and testing them all takes a lot of time. Using pytest-xdist_ can help them avoid bottlenecks. -- Ideally, **support running your test suite via pytest**. pytest_ has many +- Ideally, support running your test suite via ``pytest``. pytest_ has many command-line arguments that are truly helpful to downstreams, such as the ability to conveniently deselect tests, rerun flaky tests (via pytest-rerunfailures_), add a timeout to prevent tests from hanging From 16288af66e9877d1bb8a3812d6c74adfc2db19f9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Thu, 30 Jan 2025 13:12:57 +0000 Subject: [PATCH 11/31] Apply more suggestions from code review Co-authored-by: Alyssa Coghlan --- source/discussions/downstream-packaging.rst | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index 25740ba29..984816c14 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -165,9 +165,11 @@ should ideally extend to these processes as well. Support building against system dependencies -------------------------------------------- + Why? ~~~~ + Some Python projects have non-Python dependencies, such as libraries written in C or C++. Trying to use the system versions of these dependencies in upstream packaging may cause a number of problems for end users: @@ -202,6 +204,10 @@ However, none of these issues apply to downstream packaging, and downstreams have good reasons to prefer dynamically linking to system dependencies. In particular: +- in many cases, reliably sharing dynamic dependencies between components is a large part + of the *purpose* of a downstream packaging ecosystem. Helping to support that makes it + easier for users of those systems to access upstream projects in their preferred format. + - in many cases, reliably sharing dynamic dependencies between components is a large part of the *purpose* of a downstream packaging ecosystem. Helping to support that makes it easier for users of those systems to access upstream projects in their preferred format. @@ -231,6 +237,7 @@ In particular: How? ~~~~ + A good compromise between the needs of both parties is to provide a switch between using vendored and system dependencies. Ideally, if the package has multiple vendored dependencies, it should provide both individual switches @@ -243,6 +250,11 @@ message rather than fall back to a vendored version. This gives the packager the opportunity to notice their mistake and a chance to consciously decide how to solve it. +Note that it is reasonable for upstream projects to leave *testing* of building with +system dependencies to their downstream repackagers. The goal of these guidelines +is to facilitate more effective collaboration between upstream projects and downstream +repackagers, not to suggest upstream projects take on tasks that downstream repackagers +are better equipped to handle. Note that it is reasonable for upstream projects to leave *testing* of building with system dependencies to their downstream repackagers. The goal of these guidelines is to facilitate more effective collaboration between upstream projects and downstream @@ -254,9 +266,11 @@ are better equipped to handle. Support downstream testing -------------------------- + Why? ~~~~ + A variety of downstream projects run some degree of testing on the packaged Python projects. Depending on the particular case, this can range from minimal smoke testing to comprehensive runs of the complete test suite. There can @@ -291,6 +305,7 @@ that they report**. How? ~~~~ + There are a number of things that upstream projects can do to help downstream repackagers test their packages efficiently and effectively, including some of the suggestions already mentioned above. These are typically improvements that make the test suite more From b4c7485665311525bf50951f6ecf50ff541caa18 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Thu, 30 Jan 2025 16:28:40 +0100 Subject: [PATCH 12/31] Attempt addressing the remaining review comments --- source/discussions/downstream-packaging.rst | 32 ++++++++++----------- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index 984816c14..fc9514ade 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -73,10 +73,10 @@ a few important reasons to provide a static archive file instead: supported than e.g. using a git clone. This can help users with poor Internet connectivity. -- Downstreams often **use checksums to verify the authenticity** of source files +- Downstreams often use hashes to verify the authenticity of source files on subsequent builds, which require that they remain bitwise identical over time. For example, automatically generated git archives do not guarantee - that. + this, as the compressed data may change if gzip is upgraded on the server. - Archive files can be mirrored, reducing both upstream and downstream bandwidth use. The actual builds can afterwards be performed in firewalled @@ -132,8 +132,8 @@ is discouraged for a number of reasons: unavailable, making the build no longer possible. This is especially problematic when someone needs to build an old package version. -- Accessing remote servers poses a **privacy** issue and a potential - **security** issue, as it exposes information about the system building +- Accessing remote servers poses a privacy issue and a potential + security issue, as it exposes information about the system building the package. - The user may be using a service with a limited data plan, in which @@ -153,8 +153,8 @@ When such a switch is used, the build process should fail if some of the required files are missing, rather than try to fetch them automatically. This could be done e.g. by checking whether a ``NO_NETWORK`` environment variable is set to a non-empty value. Please also remember that if you are -fetching remote resources, you must **verify their authenticity**, e.g. against -a checksum, to protect against the file being substituted by a malicious party. +fetching remote resources, you must *verify their authenticity* (usually against +a hash), to protect against the file being substituted by a malicious party. Since downstreams frequently also run tests and build documentation, the above should ideally extend to these processes as well. @@ -165,11 +165,9 @@ should ideally extend to these processes as well. Support building against system dependencies -------------------------------------------- - Why? ~~~~ - Some Python projects have non-Python dependencies, such as libraries written in C or C++. Trying to use the system versions of these dependencies in upstream packaging may cause a number of problems for end users: @@ -237,7 +235,6 @@ In particular: How? ~~~~ - A good compromise between the needs of both parties is to provide a switch between using vendored and system dependencies. Ideally, if the package has multiple vendored dependencies, it should provide both individual switches @@ -266,11 +263,9 @@ are better equipped to handle. Support downstream testing -------------------------- - Why? ~~~~ - A variety of downstream projects run some degree of testing on the packaged Python projects. Depending on the particular case, this can range from minimal smoke testing to comprehensive runs of the complete test suite. There can @@ -287,7 +282,7 @@ be various reasons for doing this, for example: the ones present during upstream release testing. - Testing the package in an environment closely resembling the production - setup. This can detect issues caused by nontrivial interactions between + setup. This can detect issues caused by non-trivial interactions between different installed packages, including packages that are not dependencies of your package, but nevertheless can cause issues. @@ -297,15 +292,14 @@ be various reasons for doing this, for example: Admittedly, sometimes downstream testing may yield false positives or bug reports about scenarios the upstream project is not interested in supporting. However, perhaps even more often it does provide early notice of problems, -or find nontrivial bugs that would otherwise cause issues for your users -in production. And believe me, the majority of **downstream packagers are doing -their best to double-check their results, and help you triage and fix the bugs -that they report**. +or find non-trivial bugs that would otherwise cause issues for the upstream +project's users. While mistakes do happen, the majority of downstream packagers +are doing their best to double-check their results, and help upstream +maintainers triage and fix the bugs that they reported. How? ~~~~ - There are a number of things that upstream projects can do to help downstream repackagers test their packages efficiently and effectively, including some of the suggestions already mentioned above. These are typically improvements that make the test suite more @@ -367,6 +361,10 @@ Some specific suggestions are: the ability to conveniently deselect tests, rerun flaky tests (via pytest-rerunfailures_), add a timeout to prevent tests from hanging (via pytest-timeout_) or run tests in parallel (via pytest-xdist_). + Note that test suites don't need to be *written* with ``pytest`` to be + *executed* with ``pytest``: ``pytest`` is able to find and execute almost + all test cases that are compatible with the standard library's ``unittest`` + test discovery. .. _responses: https://pypi.org/project/responses/ From df0c91a6f379ef4139b632c3f5aebdcb46e8ea58 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Thu, 30 Jan 2025 20:20:01 +0100 Subject: [PATCH 13/31] Add a section on stable channels --- source/discussions/downstream-packaging.rst | 66 +++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index fc9514ade..edb21dcc9 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -367,6 +367,71 @@ Some specific suggestions are: test discovery. +.. _aim-for-stable-releases: + +Aim for stable releases +----------------------- + +Why? +~~~~ + +Many downstreams provide stable release channels in addition to the main +package streams. The goal of these channels is to provide more conservative +upgrades to users with higher stability needs. These users often prefer +to trade having the newest features available for lower risk of issues. + +While the exact policies differ, an important criterion for including a new +package version in a stable release channel is for it to be available in testing +for some time already, and have no known major regressions. For example, +in Gentoo Linux a package is usually marked stable after being available +in testing for a month, and being tested against the versions of its +dependencies that are marked stable at the time. + +However, there are circumstances which demand more prompt action. For example, +if a security vulnerability or a major bug is found in the version that is +currently available in the stable channel, the downstream is facing a need +to resolve it. In this case, they need to consider various options, such as: + +- putting a new version in the stable channel early, + +- adding patches to the version currently published, + +- or even downgrading the stable channel to an earlier release. + +Each of these options involves certain risks and a certain amount of work, +and packagers needs to weigh them to determine the course of action. + +How? +~~~~ + +There are some things that upstreams can do to tailor their workflow to stable +release channels. These actions often are beneficial to the package's users +as well. Some specific suggestions are: + +- Adjust the release frequency to the rate of code changes. Packages that + are released rarely often bring significant changes with every release, + and a higher risk of accidental regressions. + +- Avoid mixing bug fixes and new features, if possible. In particular, if there + are known bug fixes merged already, consider making a new release before + merging feature branches. + +- Consider making prereleases after major changes, to provide more testing + opportunities for users and downstreams willing to opt-in. + +- If your project is subject to very intense development, consider splitting + one or more branches that include a more conservative subset of commits, + and are released separately. For example, Django_ currently maintains three + release branches in addition to main. + +- Even if you don't wish to maintain additional branches permanently, consider + making additional patch releases with minimal changes to the previous + version, especially when a security vulnerability is discovered. + +- Split your changes into focused commits that address one problem at a time, + to make it easier to cherry-pick changes to earlier releases when necessary. + + .. _responses: https://pypi.org/project/responses/ .. _vcrpy: https://pypi.org/project/vcrpy/ .. _pytest-socket: https://pypi.org/project/pytest-socket/ @@ -374,3 +439,4 @@ Some specific suggestions are: .. _pytest: https://pytest.org/ .. _pytest-rerunfailures: https://pypi.org/project/pytest-rerunfailures/ .. _pytest-timeout: https://pypi.org/project/pytest-timeout/ +.. _Django: https://www.djangoproject.com/ From fc72a38cc39b1590e752d1181118c5a073bcb813 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Fri, 31 Jan 2025 15:41:16 +0100 Subject: [PATCH 14/31] Retitle as "Supporting downstream packaging" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Thanks to @pfmoore for the suggestion. Signed-off-by: Michał Górny --- source/discussions/downstream-packaging.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index edb21dcc9..1ff735e99 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -1,8 +1,8 @@ .. _downstream-packaging: -================================= -Simplifying downstream packaging -================================= +=============================== +Supporting downstream packaging +=============================== :Page Status: Draft :Last Reviewed: 2025-? From 24462f4b05c838219b1fe9327caff32bbdd05a61 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Fri, 31 Jan 2025 15:42:51 +0100 Subject: [PATCH 15/31] Add a "not all-or-nothing" sentence MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Suggested by @pfmoore on https://discuss.python.org/t/request-for-feedback-packaging-p-o-discussion-on-helping-downstream-packaging/78985/2 Signed-off-by: Michał Górny --- source/discussions/downstream-packaging.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index 1ff735e99..d3df25ad1 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -21,6 +21,8 @@ and what additional challenges downstream packagers typically face. It aims to provide some optional guidelines that project maintainers may choose to follow which help make downstream packaging *significantly* easier (without imposing any major maintenance hassles on the upstream project). +Note that this is not an all-or-nothing proposal — anything that upstream +maintainers can do is useful, even if it's only a small part. Establishing a good relationship between software maintainers and downstream packagers can bring mutual benefits. Downstreams are often willing to share From 29cc38acf6ac0649b88ec00d68165b3ec8dc25b2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Fri, 31 Jan 2025 15:53:41 +0100 Subject: [PATCH 16/31] Add a note that downstreams can send patches to fix these issues --- source/discussions/downstream-packaging.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index d3df25ad1..c4934b802 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -22,7 +22,8 @@ to provide some optional guidelines that project maintainers may choose to follow which help make downstream packaging *significantly* easier (without imposing any major maintenance hassles on the upstream project). Note that this is not an all-or-nothing proposal — anything that upstream -maintainers can do is useful, even if it's only a small part. +maintainers can do is useful, even if it's only a small part. Downstream +maintainers are also willing to prepare patches to resolve these issues. Establishing a good relationship between software maintainers and downstream packagers can bring mutual benefits. Downstreams are often willing to share From 9d5fbe6d212f0e50b6963594aec0128a1b0d999f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Sat, 1 Feb 2025 15:44:41 +0100 Subject: [PATCH 17/31] Capitalize Git, per @pawamoy --- source/discussions/downstream-packaging.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index c4934b802..bc5edbf4d 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -69,16 +69,16 @@ source distributions may include: - having a consistent build process across all Python packages -While it is usually possible to build packages from a git repository, there are +While it is usually possible to build packages from a Git repository, there are a few important reasons to provide a static archive file instead: - Fetching a single file is often more efficient, more reliable and better - supported than e.g. using a git clone. This can help users with poor + supported than e.g. using a Git clone. This can help users with poor Internet connectivity. - Downstreams often use hashes to verify the authenticity of source files on subsequent builds, which require that they remain bitwise identical over - time. For example, automatically generated git archives do not guarantee + time. For example, automatically generated Git archives do not guarantee this, as the compressed data may change if gzip is upgraded on the server. - Archive files can be mirrored, reducing both upstream and downstream @@ -88,7 +88,7 @@ a few important reasons to provide a static archive file instead: - Explicitly publishing archive files can ensure that any dependencies on version control system metadata are resolved when creating the source archive. For example, automatically - generated git archives omit all of the commit tag information, potentially resulting in + generated Git archives omit all of the commit tag information, potentially resulting in incorrect version details in the resulting builds. How? @@ -108,7 +108,7 @@ separate archives. A good idea is to **use your source distribution in the release workflow**. That is, build it first, then unpack it and perform all the remaining steps -using the unpacked distribution rather than the git repostiry — run tests, +using the unpacked distribution rather than the Git repostiry — run tests, build documentation, build wheels. This ensures that it is well-tested, and reduces the risk that some users would hit build failures or install an incomplete package. From 4d95da2cf538aec9f8eb81cc239a0a7279e9aee0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Sat, 1 Feb 2025 15:46:23 +0100 Subject: [PATCH 18/31] Fix inconsistent case in bullet points and remove duplicate Thanks to @pawamoy --- source/discussions/downstream-packaging.rst | 22 +++++++++------------ 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index bc5edbf4d..9110b1f82 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -55,19 +55,19 @@ rather than use the upstream-provided binary packages. This is also true of pure Python packages that provide universal wheels. The reasons for using source distributions may include: -- being able to audit the source code of all packages +- Being able to audit the source code of all packages. -- being able to run the test suite and build documentation +- Being able to run the test suite and build documentation. -- being able to easily apply patches, including backporting commits - from the project's repository and sending patches back to the project +- Being able to easily apply patches, including backporting commits + from the project's repository and sending patches back to the project. -- being able to build on a specific platform that is not covered - by upstream builds +- Being able to build on a specific platform that is not covered + by upstream builds. -- being able to build against specific versions of system libraries +- Being able to build against specific versions of system libraries. -- having a consistent build process across all Python packages +- Having a consistent build process across all Python packages. While it is usually possible to build packages from a Git repository, there are a few important reasons to provide a static archive file instead: @@ -205,11 +205,7 @@ However, none of these issues apply to downstream packaging, and downstreams have good reasons to prefer dynamically linking to system dependencies. In particular: -- in many cases, reliably sharing dynamic dependencies between components is a large part - of the *purpose* of a downstream packaging ecosystem. Helping to support that makes it - easier for users of those systems to access upstream projects in their preferred format. - -- in many cases, reliably sharing dynamic dependencies between components is a large part +- In many cases, reliably sharing dynamic dependencies between components is a large part of the *purpose* of a downstream packaging ecosystem. Helping to support that makes it easier for users of those systems to access upstream projects in their preferred format. From 6f557097a422beb7d81a3bcb245a3e5569a16c8e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Sat, 1 Feb 2025 15:52:59 +0100 Subject: [PATCH 19/31] Apply typo fixes, thanks to @pawamoy --- source/discussions/downstream-packaging.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index 9110b1f82..3aef888f4 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -99,7 +99,7 @@ necessary to build the package itself, run its test suite, build and install its documentation, and any other files that may be useful to end users, such as shell completions, editor support files, and so on. -Some projects are concerned about increasing the size of source distribution, +Some projects are concerned about increasing the size of source distributions, or do not wish Python packaging tools to fall back to source distributions automatically. In these cases, a good compromise may be to publish a separate source archive for downstream use, for example by attaching it to a GitHub @@ -108,7 +108,7 @@ separate archives. A good idea is to **use your source distribution in the release workflow**. That is, build it first, then unpack it and perform all the remaining steps -using the unpacked distribution rather than the Git repostiry — run tests, +using the unpacked distribution rather than the Git repository — run tests, build documentation, build wheels. This ensures that it is well-tested, and reduces the risk that some users would hit build failures or install an incomplete package. From 548ab343b608b6885190f23d75d6f29bbf0ff1e3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Sat, 1 Feb 2025 16:09:41 +0100 Subject: [PATCH 20/31] Clarify that source distribution needs only package's files Let's add some clarification to the "complete source distribution" part to make it clear that we're talking only of package's files, and not requiring people to vendor all the dependencies. --- source/discussions/downstream-packaging.rst | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index 3aef888f4..ae4dc5235 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -94,10 +94,19 @@ a few important reasons to provide a static archive file instead: How? ~~~~ -Ideally, **a source distribution archive should include all the files** -necessary to build the package itself, run its test suite, build and install -its documentation, and any other files that may be useful to end users, such as -shell completions, editor support files, and so on. +Ideally, **a source distribution archive should include all the files +from the package's Git repository** that are necessary to build the package +itself, run its test suite, build and install its documentation, and any other +files that may be useful to end users, such as shell completions, editor +support files, and so on. + +This point applies only to the files belonging to the package itself. +The downstream packaging process, much like Python package managers, will +provision the necessary Python dependencies, system tools and external +libraries that are needed by your package and its build scripts. However, +the files listing these dependencies (for example, ``requirements*.txt`` files) +should also be included, to help downstreams determine the needed dependencies, +and check for changes in them. Some projects are concerned about increasing the size of source distributions, or do not wish Python packaging tools to fall back to source distributions From e925da1d066140ee628068f9438ec43def11556d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Sat, 1 Feb 2025 16:10:39 +0100 Subject: [PATCH 21/31] Fix inconsistent whitespace between sentences --- source/discussions/downstream-packaging.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index ae4dc5235..ec6b782d3 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -110,7 +110,7 @@ and check for changes in them. Some projects are concerned about increasing the size of source distributions, or do not wish Python packaging tools to fall back to source distributions -automatically. In these cases, a good compromise may be to publish a separate +automatically. In these cases, a good compromise may be to publish a separate source archive for downstream use, for example by attaching it to a GitHub release. Alternatively, large files, such as test data, can be split into separate archives. @@ -206,7 +206,7 @@ in upstream packaging may cause a number of problems for end users: For these reasons, you may reasonably decide to either statically link your dependencies, or to provide local copies in the installed package. -You may also vendor the dependency in your source distribution. Sometimes +You may also vendor the dependency in your source distribution. Sometimes these dependencies are also repackaged on PyPI, and can be declared as project dependencies like any other Python package. @@ -325,7 +325,7 @@ Some specific suggestions are: - Make the test suite work offline. Mock network interactions, using packages such as responses_ or vcrpy_. If that is not possible, make it possible to easily disable the tests using Internet access, e.g. via a pytest_ - marker. Use pytest-socket_ to verify that your tests work offline. This + marker. Use pytest-socket_ to verify that your tests work offline. This often makes your own test workflows faster and more reliable as well. - Make your tests work without a specialized setup, or perform the necessary From 0eb407c94d61a253cfe5a89fb0868ed0e4eb9d02 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Sat, 1 Feb 2025 16:28:51 +0100 Subject: [PATCH 22/31] Make the point of reusing source distribution lighter Split the reuse of source distribution into two parts: the easier part of building a wheel from sdist (which is what build tool does), and the harder part of using it in all workflows. For the latter, suggest it's fine to let downstreams worry about that. --- source/discussions/downstream-packaging.rst | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index ec6b782d3..5ec53a354 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -115,12 +115,16 @@ source archive for downstream use, for example by attaching it to a GitHub release. Alternatively, large files, such as test data, can be split into separate archives. -A good idea is to **use your source distribution in the release workflow**. -That is, build it first, then unpack it and perform all the remaining steps -using the unpacked distribution rather than the Git repository — run tests, -build documentation, build wheels. This ensures that it is well-tested, -and reduces the risk that some users would hit build failures or install -an incomplete package. +A good idea is to use your source distribution in the release workflow. +For example, the :ref:`build` tool does exactly that — it first builds a source +distribution, and then uses it to build a wheel. This ensures that the source +distribution actually works, and that it won't accidentally install fewer files +than the official wheels. + +Ideally, use the source distribution also run tests, build documentation, +and so on, or add specific tests to make sure that all necessary files were +actually included. Understandably, this requires more effort, so it's fine +not do that — downstream packagers will report any missing files promptly. .. _no-internet-access-in-builds: From 94743f973b2b68af2c3d16614e8aa4ab4edc52c4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Sat, 1 Feb 2025 16:45:34 +0100 Subject: [PATCH 23/31] Clarify the Internet part Clarify the "do not use the Internet" part to make it clearer that we're talking about backend actions, and custom build scripts in particular. We're not talking of Python dependencies that are fetched and installed by frontends. Also, reflow the text to be more logical. --- source/discussions/downstream-packaging.rst | 40 +++++++++++++-------- 1 file changed, 26 insertions(+), 14 deletions(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index 5ec53a354..67ed491bf 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -136,9 +136,12 @@ Why? ~~~~ Downstream builds are frequently done in sandboxed environments that cannot -access the Internet. Even if this is not the case, and assuming that you took -sufficient care to properly authenticate downloads, using the Internet -is discouraged for a number of reasons: +access the Internet. The package sources are unpacked into this environment, +and all the necessary dependencies are installed. + +Even if this is not the case, and assuming that you took sufficient care to +properly authenticate downloads, using the Internet is discouraged for a number +of reasons: - The Internet connection may be unstable (e.g. due to poor reception) or suffer from temporary problems that could cause the process to fail @@ -159,22 +162,31 @@ is discouraged for a number of reasons: How? ~~~~ -Your source distribution should either include all the files needed -for the package to build, or allow provisioning them externally. Ideally, -it should not even attempt to access the Internet at all, unless explicitly -requested to. If that is not possible to achieve, the next best thing -is to provide an opt-out switch to disable all Internet access. +If the package is implementing any custom build *backend* actions that use +the Internet, for example by automatically downloading vendored dependencies +or fetching Git submodules, its source distribution should either include all +of these files or allow provisioning them externally, and the Internet must not +be used if the files are already present. + +Note that this point does not apply to Python dependencies that are specified +in the package metadata, and are fetched during the build and installation +process by *frontends* (such as :ref:`build` or :ref:`pip`). Downstreams use +frontends that use local provisioning for Python dependencies. -When such a switch is used, the build process should fail if some -of the required files are missing, rather than try to fetch them automatically. -This could be done e.g. by checking whether a ``NO_NETWORK`` environment -variable is set to a non-empty value. Please also remember that if you are -fetching remote resources, you must *verify their authenticity* (usually against -a hash), to protect against the file being substituted by a malicious party. +Ideally, custom build scripts should not even attempt to access the Internet +at all, unless explicitly requested to. If any resources are missing and need +to be fetched, they should ask the user for permission first. If that is not +feasible, the next best thing is to provide an opt-out switch to disable +all Internet access. This could be done e.g. by checking whether +a ``NO_NETWORK`` environment variable is set to a non-empty value. Since downstreams frequently also run tests and build documentation, the above should ideally extend to these processes as well. +Please also remember that if you are fetching remote resources, you absolutely +must *verify their authenticity* (usually against a hash), to protect against +the file being substituted by a malicious party. + .. _support-system-dependencies-in-builds: From 704d1a575d0950d6bedd9722dcf73b439c325c89 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Sat, 1 Feb 2025 19:38:29 +0000 Subject: [PATCH 24/31] Apply suggestions from code review Co-authored-by: Jelle Zijlstra --- source/discussions/downstream-packaging.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index 67ed491bf..118d04a4c 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -121,7 +121,7 @@ distribution, and then uses it to build a wheel. This ensures that the source distribution actually works, and that it won't accidentally install fewer files than the official wheels. -Ideally, use the source distribution also run tests, build documentation, +Ideally, also use the source distribution to run tests, build documentation, and so on, or add specific tests to make sure that all necessary files were actually included. Understandably, this requires more effort, so it's fine not do that — downstream packagers will report any missing files promptly. @@ -271,7 +271,7 @@ message rather than fall back to a vendored version. This gives the packager the opportunity to notice their mistake and a chance to consciously decide how to solve it. -Note that it is reasonable for upstream projects to leave *testing* of building with +It is reasonable for upstream projects to leave *testing* of building with system dependencies to their downstream repackagers. The goal of these guidelines is to facilitate more effective collaboration between upstream projects and downstream repackagers, not to suggest upstream projects take on tasks that downstream repackagers From e5966091ec6981157255f1a4651d05e7c9f5d7f7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Sat, 1 Feb 2025 20:57:23 +0100 Subject: [PATCH 25/31] Remove duplicate paragraph --- source/discussions/downstream-packaging.rst | 5 ----- 1 file changed, 5 deletions(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index 118d04a4c..e29cf08fe 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -276,11 +276,6 @@ system dependencies to their downstream repackagers. The goal of these guideline is to facilitate more effective collaboration between upstream projects and downstream repackagers, not to suggest upstream projects take on tasks that downstream repackagers are better equipped to handle. -Note that it is reasonable for upstream projects to leave *testing* of building with -system dependencies to their downstream repackagers. The goal of these guidelines -is to facilitate more effective collaboration between upstream projects and downstream -repackagers, not to suggest upstream projects take on tasks that downstream repackagers -are better equipped to handle. .. _support-downstream-testing: From 169281d98494cc662ae0b00243d454dc394ec2b6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Sat, 1 Feb 2025 20:57:35 +0100 Subject: [PATCH 26/31] Clarify source distributions Clarify that "source distributions" generally refer to the files published on PyPI, and when we are referring to publishing them elsewhere. While at it, expand on the size/fallback argument, and mention that some projects install tests as part of the Python package. --- source/discussions/downstream-packaging.rst | 27 +++++++++++++++------ 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index e29cf08fe..6e69ca4af 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -94,7 +94,7 @@ a few important reasons to provide a static archive file instead: How? ~~~~ -Ideally, **a source distribution archive should include all the files +Ideally, **a source distribution archive published on PyPI should include all the files from the package's Git repository** that are necessary to build the package itself, run its test suite, build and install its documentation, and any other files that may be useful to end users, such as shell completions, editor @@ -108,12 +108,22 @@ the files listing these dependencies (for example, ``requirements*.txt`` files) should also be included, to help downstreams determine the needed dependencies, and check for changes in them. -Some projects are concerned about increasing the size of source distributions, -or do not wish Python packaging tools to fall back to source distributions -automatically. In these cases, a good compromise may be to publish a separate -source archive for downstream use, for example by attaching it to a GitHub -release. Alternatively, large files, such as test data, can be split into -separate archives. +Some projects have concerns related to Python package managers using source +distributions from PyPI. They do not wish to increase their size with files +that are not used by these tools, or they do not wish to publish source +distributions at all, as they enable a problematic or outright nonfunctional +fallback to building the particular project from source. In these cases, a good +compromise may be to publish a separate source archive for downstream use +elsewhere, for example by attaching it to a GitHub release. Alternatively, +large files, such as test data, can be split into separate archives. + +On the other hand, some projects (NumPy_, for instance) decide to install tests +in their Python packages. This has the added advantage of permitting users to +run tests after installing them, for example to check for regressions +after upgrading a dependency. Yet another approach is to split tests or test +data into a separate Python package. Such an approach was taken by +the cryptography_ project, with the large test vectors being split +to cryptography-vectors_ package. A good idea is to use your source distribution in the release workflow. For example, the :ref:`build` tool does exactly that — it first builds a source @@ -459,3 +469,6 @@ as well. Some specific suggestions are: .. _pytest-rerunfailures: https://pypi.org/project/pytest-rerunfailures/ .. _pytest-timeout: https://pypi.org/project/pytest-timeout/ .. _Django: https://www.djangoproject.com/ +.. _NumPy: https://numpy.org/ +.. _cryptography: https://pypi.org/project/cryptography/ +.. _cryptography-vectors: https://pypi.org/project/cryptography-vectors/ From bb8ac35a866e807823770e4effab785cdb39f444 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Sat, 1 Feb 2025 20:58:38 +0100 Subject: [PATCH 27/31] Add non-reproducibility argument for changing resources --- source/discussions/downstream-packaging.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index 6e69ca4af..5fbab130f 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -161,6 +161,8 @@ of reasons: unavailable, making the build no longer possible. This is especially problematic when someone needs to build an old package version. +- The remote resources may change, making the build not reproducible. + - Accessing remote servers poses a privacy issue and a potential security issue, as it exposes information about the system building the package. From addf891979a8600e75a111568b04cdf058087b5b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Sun, 2 Feb 2025 14:51:50 +0100 Subject: [PATCH 28/31] Mention removing duplication of patches and inconsistency --- source/discussions/downstream-packaging.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index 5fbab130f..3a08cf2a7 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -24,6 +24,9 @@ follow which help make downstream packaging *significantly* easier Note that this is not an all-or-nothing proposal — anything that upstream maintainers can do is useful, even if it's only a small part. Downstream maintainers are also willing to prepare patches to resolve these issues. +Having these patches merged can be very helpful, since it removes the need +for different downstreams to carry and keep rebasing the same patches, +and the risk of applying inconsistent solutions to the same problem. Establishing a good relationship between software maintainers and downstream packagers can bring mutual benefits. Downstreams are often willing to share From 8a3a56c4e4d9f030962ca3b5fb89704ebd302ffc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Sun, 2 Feb 2025 14:52:10 +0100 Subject: [PATCH 29/31] Reword installing tests to make it clearer --- source/discussions/downstream-packaging.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index 3a08cf2a7..723b3fe75 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -120,8 +120,8 @@ compromise may be to publish a separate source archive for downstream use elsewhere, for example by attaching it to a GitHub release. Alternatively, large files, such as test data, can be split into separate archives. -On the other hand, some projects (NumPy_, for instance) decide to install tests -in their Python packages. This has the added advantage of permitting users to +On the other hand, some projects (NumPy_, for instance) decide to include tests +in their installed packages. This has the added advantage of permitting users to run tests after installing them, for example to check for regressions after upgrading a dependency. Yet another approach is to split tests or test data into a separate Python package. Such an approach was taken by From 58eaf854f6b53dff449d3bf93bef8f70854428df Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Sun, 2 Feb 2025 14:52:28 +0100 Subject: [PATCH 30/31] Give an example of "catastrophic failure" --- source/discussions/downstream-packaging.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index 723b3fe75..d25845e47 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -266,7 +266,8 @@ In particular: same library being loaded in the same process (for example, attempting to import two Python packages that link to different versions of the same library). This sometimes works without incident, but it can also lead to anything from library - loading errors, to subtle runtime bugs, to catastrophic system failures. + loading errors, to subtle runtime bugs, to catastrophic failures (like suddenly + crashing and losing data). - Last but not least, static linking and vendoring results in duplication, and may increase the use of both disk space and memory. From 76aaf7919aaad728d094a8990943217def9fcaf4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Date: Sat, 22 Feb 2025 12:12:26 +0100 Subject: [PATCH 31/31] Indicate that some distributions require building from sources --- source/discussions/downstream-packaging.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/source/discussions/downstream-packaging.rst b/source/discussions/downstream-packaging.rst index d25845e47..3f4795fa8 100644 --- a/source/discussions/downstream-packaging.rst +++ b/source/discussions/downstream-packaging.rst @@ -54,9 +54,10 @@ Why? ~~~~ The vast majority of downstream packagers prefer to build packages from source, -rather than use the upstream-provided binary packages. This is also true -of pure Python packages that provide universal wheels. The reasons for using -source distributions may include: +rather than use the upstream-provided binary packages. In some cases, using +sources is actually required for the package to be included in the distribution. +This is also true of pure Python packages that provide universal wheels. +The reasons for using source distributions may include: - Being able to audit the source code of all packages.