-
Notifications
You must be signed in to change notification settings - Fork 8
Python packaging notes
jared321 edited this page Feb 5, 2025
·
10 revisions
The following are based on information spread across many different web pages at the time of writing. The contents here are, therefore, not guaranteed to be up-to-date.
- Building a Python package can mean building prebuilt binary wheels or source
distributions with the
buildpackage. It can also refer to, for example, installing the package in editable/build mode via pip install (i.e., with the -e flag). - Tools such as
buildbuild packages in temporary build system environments that isolate the build process from the user's other Python environments. - Python packaging was initially tied to the use of
distutilsas a build backend. This included a tight coupling to thesetup.pyexecutable file to specify the structure and building of the package. This was acceptable becausedistutilswas part of Python and thereforesetup.pycould always be run to build a package without the need to setup a special build environment. - However, there are now many different build backends that a package could
use including
setuptools.setup.pyoriginally allowed users to specify in the file the package's external dependencies for setting up a special package build environment (e.g.,Cythonmust be installed). This is not acceptable since the tools involved in building a package would need to execute thesetup.pyfile just to determine what dependencies are needed to runsetup.py. - One means to manage this and other difficulties was for pip to always
assume that
setuptoolsis a dependency. I believe that this was removed recently as for several projects I have recently had to explicitly includesetuptoolsas a build requirement. -
PEP518 was created and adopted to allow for a tool-agnostic standard for
declaring in a
pyproject.tomlfile the information needed to declare what build system is to be used and how to setup an isolated, temporary build system environment. Related to this, users should no longer specify build system requirements insetup.pyfiles. - According to setuptools docs,
When creating a Python package, you must provide a pyproject.toml file containing a build-system section
- The
pyproject.tomlfile allows for including all other information that is needed to specify all other packaging aspects for basic packages. However, packages can use other mechanisms such assetup.pyandsetup.cfgto express other packaging information. That said the setuptools docs state that
We also recommend users to expose as much as possible configuration in a more declarative way via the pyproject.toml or setup.cfg, and keep the setup.py minimal with only the dynamic parts (or even omit it completely if applicable).
- Both
pyproject.tomlandsetup.pycan contain the same packaging information. As far as I have seen there is no automatic mechanism for ensuring that the same type of information is not provided in both. I have not found any information that explains which source of information is used if the same specifications are provided in both but with different values. - Surmise includes numpy as an external dependence due to its use in general Python code.
- Surmise uses the numpy C API in its .pyx file. In particular, it creates numpy objects and returns these. Please see numpy C API notes for more info.
- Due to the combined use of Cython and the numpy C API, there are three
different virtual environments involved in building/distributing/installing
surmise
- a developer environment for cythonizing .pyx files with respect to a target numpy version (I do not know that a target numpy must be implicitly specified as part of the Cythonization process. However, I presently have to install in the venv a numpy version that is compatible with the numpy version used to compile the C code in order to get tests passing.),
- the build system environment that includes the numpy C interface
installation to link against when setuptools builds the compiled .c file
into a
.soextension module - a developer/user environment in which surmise is installed for use, testing, or development and that includes numpy to satisfy the dependence of the surmise python code including the use of the numpy objects created and returned by the C code.
- The use of a C extension that uses the numpy C interface requires that all
developers and users building the package must have numpy installed
in their build environment. Therefore, a nonstandard build system environment
is required. Hence, the use of
pyproject.tomlfor specifying at least the build system is not only ill-advised but unavoidable. - The Cython docs
suggest that packages that contain .pyx file use
setup.pyto manage the extra steps required to convert the .pyx file to .c and build the package. In this case, use of bothpyproject.tomlandsetup.pywould be required. - Both setuptools and Cython suggest distributing the .c files generated by
Cython and including these files in the version control system. By doing
this, developers making changes to the .pyx file are required to Cythonize
their files with a Cython installed in their development environments. They
can then test these new .c files and subsequently commit them. In addition,
the build system and users would not need to install Cython in order to use
the package. For instance, the
setup.pyfile would not specify how to Cythonize any files, but rather would only specify how to include each .c file in the package as a setuptools extension. - If the .c file aren't included in the package, then we are considering that the .pyx files are the main source files and the .c files are created under-the-hood as intermediate files that we can otherwise hide from users. The above suggestions reframe this to understand that the developers decided that C code is needed and that the .c files are then the main source files. Whether these files were created directly by the developers or by Cython is unimportant to the users. Indeed, we could transition from Cython-created .c files to directly developed .c files in these scheme without users knowing.
- One benefit of having only developers engage with Cython and including the .c files in the version control repository is that the developers are in control of what versions of Cython and numpy are used and all developers and users will use the same version of .c files that were tested by developers of the associated .pyx files. This is good for both quality control as well as working toward reproducibility and maintainability.
- If the .c files are created manually by developers and included in the repo,
then
setup.pydoes not need to useCythonto build the package. Rather it only builds in the C code as a standard C extension. While C extensions can be specified viapyproject.toml, the use of the numpy C interface requires informing the Extension where to find the numpy headers. Since obtaining the location requires importing numpy and callingget_include(), we must usesetup.py. - The scheme that we have adopted for surmise is to use
pyproject.tomlandsetup.pybut have a clear role for each file so that these two mechanisms are mininal and decoupled.- include in
setup.pyall information needed to specify the structure of the package - include in
pyproject.tomlall information needed to build (e.g., the build system specification and automatic setting of version bysetuptools-scm) and potentially distribute the package (e.g., building and testing wheels viacibuildwheel).
- include in
- It is, therefore, important to match numpy version across these environments without unnecessarily limiting a user's ability to flexibly choose a desired numpy version. In particular, the numpy objects created by the compiled code, which are created in accord with the numpy version available at compile time, should be compatible with any version of numpy that we allow users to install alongside surmise.
- In this spirit of explicit developer control and responsibility, for surmise
we have decided to
- add a
cythonizetask to the tox interface with a fixed version of Cython specified as an external dependence so that developers can easily generate .c files using the same pre-determined Cython version and a fixed version of numpy that Cython should target for its code translation, - specify a fixed version of numpy as a build system requirement in
pyproject.tmolwith that version matching the target version specified in the toxcythonizecommand - include the .c files in the version control repository.
- add a
- According to NEP29, numpy 1.24 could already be dropped and 1.25 could be dropping in the Summer of 2025. So starting with >=1.25 support should be good.
- While studying this issue, I ran into a numpy vs. setuptools installation
issue. In particular, I couldn't install the software with numpy < 1.26 because
pip would try to install numpy before setuptools when it appears that setuptools
is needed by the numpy installation. I suspect that setuptools was not listed
as a requirement for these older versions and, therefore, pip does not have the
full set of information needed to determine the installation order.
- This suggests that we should try supporting numpy >= 1.26 to start.
- Following Cython docs, inform pip and build that the C code must be compiled
and integrated into the package using
setuptoolsto configure the file as an Extension. - To build surmise's C code into a module, we must include in
setup.pya means for the compiler/linker to find numpy headers. This strongly suggests that the build process is using and linking against the numpy C API's headers and libraries. - On a Mac, I ran
otool -Lon the build.sofile, however, and do not see a numpy library listed as an external dependence. This suggests that the module was linked statically against a numpy library. - All indications are that the setuptools Extension is constructing and using a build system to compile and link the C code. I do not see any means to influence how this build system is constructed apart from a few keywords that allow developers to specify compiler/linker flags. However, I would prefer to avoid this unless I can know what compiler family is going to be used.
- I have not found any indication that the modules are guaranteed to be built
statically so that we know that we don't have to, for example, distribute with
a wheel the numpy library that we used to build the wheel (using delocate or
auditwheel for example). To the contrary in compiler output, I recall seeing
-dynamicin the output.