Skip to content

Python packaging notes

jared321 edited this page Feb 5, 2025 · 10 revisions

The following are based on information spread across many different web pages at the time of writing. The contents here are, therefore, not guaranteed to be up-to-date.

AT PRESENT, THIS IS A WORK IN PROGRESS AND CONTENT MIGHT BE WRONG!

  • Building a Python package can mean building prebuilt binary wheels or source distributions with the build package. It can also refer to installing the package in editable/build mode via pip install (i.e., with the -e flag) or using pip to install a package from a source distribution that requires compilation.
  • Beginning with XXX, it was decided that build processes should be carried out in temporary build system environments that isolate the build process for the user's other Python environments.
  • Python packaging was initially tied to the use of distutils as a build backend. This included a tight coupling with the use of an executable file setup.py to specify the structure and building of the package. This was acceptable because distutils was part of Python and therefore setup.py could always be run to build a package without the need to setup a special build environment.
  • However, there are now many different build backends that a package could use including setuptools. setup.py originally allowed users to specify in the file the package's external dependencies for setting up a special package build environment (e.g., Cython must be installed). This is not acceptable since the tools involved in building a package would need to execute the setup.py file just to determine what dependencies are needed to run setup.py.
  • One means to manage this and other difficulties was for pip to always assume that setuptools is a dependency. I believe that this was removed recently as for several projects I have recently had to explicitly include setuptools as a build requirement.
  • PEP518 was created and adopted to allow for a tool-agnostic standard for declaring in a pyproject.toml file the information needed to declare what build system is to be used and how to setup an isolated, temporary build system environment. Related to this, users no longer specify build system requirements in setup.py files.
  • According to setuptools docs,

When creating a Python package, you must provide a pyproject.toml file containing a build-system section

  • The pyproject.toml file allows for including all other information that is needed to specify all other packaging aspects for basic packages. However, packages can use other mechanisms such as setup.py and setup.cfg to express other packaging information. That said the setuptools docs state that

We also recommend users to expose as much as possible configuration in a more declarative way via the pyproject.toml or setup.cfg, and keep the setup.py minimal with only the dynamic parts (or even omit it completely if applicable).

  • Both pyproject.toml and setup.py can contain the same packaging information. As far as I have seen there is no automatic mechanism for ensuring that the same type of information is not provided in both. I have not found any information that explains which source of information is used if the same specifications are provided in both but with different values.
  • The use of Cython requires that all developers and users building the package must have Cython installed in their build environment. Therefore, a nonstandard build system environment is required. Hence, the use of pyproject.toml for specifying at least the build system is required.
  • The Cython docs suggest that packages that contain .pyx file use setup.py to manage the extra steps required to build the package. In this case, use of both pyproject.toml and setup.py would be required.
  • The scheme that we have adopted for surmise is to use both but have a clear role for each file so that these two mechanisms are mininal and decoupled.
    • include in setup.py all information needed to specify the structure of the package
    • include in pyproject.toml all information needed to build (e.g., the build system specification and automatic setting of version by setuptools-scm) and potentially distribute the package (e.g., building and testing wheels via cibuildwheel).
  • Both setuptools and Cython suggest distributing the .c files generated by Cython and including these files in the version control system. By doing this, developers making changes to the .pyx file are required to Cythonize their files with a Cython installed in their development environments. They can then test these new .c files and subsequently commit them. In addition, the build system and users would not need to install Cython in order to use the package. For instance, the setup.py file would not specify how to Cythonize any files, but rather would only specify how to include each .c file in the package as a setuptools extension.
  • If the .c file aren't included in the package, then we are considering that the .pyx files are the main source files and the .c files are created under-the-hood as intermediate files that we can otherwise hide from users. The above suggestions reframe this to understand that the developers decided that C code is needed and that the .c files are then the main source files. Whether these files were created directly by the developers or by Cython is unimportant to the users. Indeed, we could transition from Cython-created .c files to directly developed .c files in these scheme without users knowing.
  • One benefit of having only developers engage with Cython and including the .c files in the version control repository is that the developers are in control of what version of Cython is used and all developers and users will use the same version of .c files that were tested by developers of the associated .pyx files. This is good for both quality control as well as working toward reproducibility and maintainability.
  • Surmise includes numpy as an external dependence due to its use in general Python code.
  • Surmise uses the numpy C API in its .pyx file. In particular, it creates numpy objects and returns these. This implies that the .c file is linked against the numpy C interface made available/found at build time. The fact that we need to provide in setup.py the location of the numpy installation's header files emphasizes this.
  • Due to the combined use of Cython and the numpy C API, there are three different virtual environments involved in building/distributing/installing surmise
    • a developer environment for cythonizing .pyx files with respect to a target numpy version (I do not know that a target numpy must be implicitly specified as part of the Cythonization process. However, I presently have to install in the venv a numpy version that is compatible with the numpy version used to compile the C code in order to get tests passing.),
    • the build system environment that includes the numpy C interface installation to link against when setuptools builds the compiled .c file .so extension modules
    • a developer/user environment in which surmise is installed for use, testing, or development and that includes numpy to satisfy the dependence of the surmise python code including the use of the numpy objects created and returned by the C code.
  • It is, therefore, important to match numpy version across these environments without unnecessarily limiting a user's ability to flexibly choose a desired numpy version. In particular, the numpy objects created by the compiled code, which are created in accord with the numpy version available at compile time, should be compatible with any version of numpy that we allow users to install alongside surmise.
  • In this spirit of explicit developer control and responsibility, for surmise we have decided to
    • add a cythonize task to the tox interface with a fixed version of Cython specified as an external dependence so that developers can easily generate .c files using the same pre-determined Cython version and a fixed version of numpy that Cython should target for its code translation,
    • specify a fixed version of numpy as a build system requirement in pyproject.tmol with that version matching the target version specified in the tox cythonize command
    • include the .c files in the version control repository.
  • According to NEP29, numpy 1.24 could already be dropped and 1.25 could be dropping in the Summer of 2025. So starting with >=1.25 support should be good.
  • While studying this issue, I ran into a numpy vs. setuptools installation issue. In particular, I couldn't install the software with numpy < 1.26 because pip would try to install numpy before setuptools when it appears that setuptools is needed by the numpy installation. I suspect that setuptools was not listed as a requirement for these older versions and, therefore, pip does not have the full set of information needed to determine the installation order.
    • This suggests that we should try supporting numpy >= 1.26 to start.

Notes on building C extensions

  • Following Cython docs, inform pip and build that the C code must be compiled and integrated into the package using setuptools to configure the file as an Extension.
  • To build surmise's C code into a module, we must include in setup.py a means for the compiler/linker to find numpy headers. This strongly suggests that the build process is using and linking against the numpy C API's headers and libraries.
  • On a Mac, I ran otool -L on the build .so file, however, and do not see a numpy library listed as an external dependence. This suggests that the module was linked statically against a numpy library.
  • All indications are that the setuptools Extension is constructing and using a build system to compile and link the C code. I do not see any means to influence how this build system is constructed apart from a few keywords that allow developers to specify compiler/linker flags. However, I would prefer to avoid this unless I can know what compiler family is going to be used.
  • I have not found any indication that the modules are guaranteed to be built statically so that we know that we don't have to, for example, distribute with a wheel the numpy library that we used to build the wheel (using delocate or auditwheel for example). To the contrary in compiler output, I recall seeing -dynamic in the output.
  • NEP29 specifies compilers and versions. Perhaps this can be a source of information.

Clone this wiki locally