Skip to content

Python packaging notes

jared321 edited this page Feb 4, 2025 · 10 revisions

The following are based on information spread across many different web pages at the time of writing. The contents here are, therefore, not guaranteed to be up-to-date.

AT PRESENT, THIS IS A WORK IN PROGRESS AND CONTENT MIGHT BE WRONG!

  • Building a Python package can mean building prebuilt binary wheels or source distributions with the build package. It can also refer to installing the package in editable/build mode via pip install (i.e., with the -e flag).
  • There are many different build backends that a package could use including setuptools.
  • Beginning with XXX, it was decided that build processes should be carried out in temporary build system environments that isolate the build process for the user's other Python environments.
  • Python packaging was initially tied strongly to the use of setuptools as a build backend, this included a tight coupling with the use of setup.py to specify the structure and building of the package.
  • setup.py originally allowed users to specify a package's external dependencies for just the build phase. This is not acceptable since the tools involved in building a package would need to load a setuptools setup.py file just to determine what backend to use and how to setup a build environment for it.
  • PEP518 was created and adopted to allow for a tool-agnostic standard for declaring in a pyproject.toml file the information needed to declare what build system is to be used and how to setup an isolated, temporary build system environment. Related to this, users no longer specify build system requirements in setup.py files.
  • The pyproject.toml file allows for including all other information that is needed to specify all other packaging aspects for basic packages.
  • The use of Cython requires that all developers and users building the package must have Cython installed in their build environment. Therefore, a nonstandard build system environment is required. Hence, the use of pyproject.toml for specifying at least the build system is required.
  • The Cython docs suggest that packages that contain .pyx file use setup.py to manage the extra steps required to build the package. In this case, use of both pyproject.toml and setup.py would be required.
  • The scheme that we have adopted for surmise is to use both but have a clear role for each file so that these two mechanisms are mininal and decoupled.
    • include in setup.py all information needed to specify the structure of the package
    • include in pyproject.toml all information needed to build (e.g., the build system specification and automatic setting of version by setuptools-scm) and potentially distribute the package (e.g., building and testing wheels via cibuildwheel).
  • Both setuptools and Cython suggest distributing the .c files generated by Cython and including these files in the version control system. By doing this, developers making changes to the .pyx file are required to Cythonize their files with a Cython installed in their development environments. They can then test these new .c files and subsequently commit them. In addition, the build system and users would not need to install Cython in order to use the package. For instance, the setup.py file would not specify how to Cythonize any files, but rather would only specify how to include each .c file in the package as a setuptools extension.
  • If the .c file aren't included in the package, then we are considering that the .pyx files are the main source files and the .c files are created under-the-hood as intermediate files that we can otherwise hide from users. The above suggestions reframe this to understand that the developers decided that C code is needed and that the .c files are then the main source files. Whether these files were created directly by the developers or by Cython is unimportant to the users. Indeed, we could transition from Cython-created .c files to directly developed .c files in these scheme without users knowing.
  • One benefit of having only developers engage with Cython and including the .c files in the version control repository is that the developers are in control of what version of Cython is used and all developers and users will use the same version of .c files that were tested by developers of the associated .pyx files. This is good for both quality control as well as working toward reproducibility and maintainability.
  • Surmise includes numpy as an external dependence due to its use in general Python code.
  • Surmise uses the numpy C API in its .pyx file. In particular, it creates numpy objects and returns these. This implies that the .c file is linked against the numpy C interface made available/found at build time. The fact that we need to provide in setup.py the location of the numpy installation's header files emphasizes this.
  • Due to the combined use of Cython and the numpy C API, there are three different virtual environments involved in building/distributing/installing surmise
    • a developer environment for cythonizing .pyx files with respect to a target numpy version (I do not know that a target numpy must be implicitly specified as part of the Cythonization process. However, I presently have to install in the venv a numpy version that is compatible with the numpy version used to compile the C code in order to get tests passing.),
    • the build system environment that includes the numpy C interface installation to link against when setuptools builds the compiled .c file .so extension modules
    • a developer/user environment in which surmise is installed for use, testing, or development and that includes numpy to satisfy the dependence of the surmise python code including the use of the numpy objects created and returned by the C code.
  • It is, therefore, important to match numpy version across these environments without unnecessarily limiting a user's ability to flexibly choose a desired numpy version. In particular, the numpy objects created by the compiled code, which are created in accord with the numpy version available at compile time, should be compatible with any version of numpy that we allow users to install alongside surmise.
  • In this spirit of explicit developer control and responsibility, for surmise we have decided to
    • add a cythonize task to the tox interface with a fixed version of Cython specified as an external dependence so that developers can easily generate .c files using the same pre-determined Cython version and a fixed version of numpy that Cython should target for its code translation,
    • specify a fixed version of numpy as a build system requirement in pyproject.tmol with that version matching the target version specified in the tox cythonize command
    • include the .c files in the version control repository.

Notes on building C extensions

  • Following Cython docs, inform pip and build that the C code must be compiled and integrated into the package using setuptools to configure the file as an Extension.
  • To build surmise's C code into a module, we must include in setup.py a means for the compiler/linker to find numpy headers. This strongly suggests that the build process is using and linking against the numpy C API's headers and libraries.
  • On a Mac, I ran otool -L on the build .so file, however, and do not see a numpy library listed as an external dependence. This suggests that the module was linked statically against a numpy library.
  • All indications are that the setuptools Extension is constructing and using a build system to compile and link the C code. I do not see any means to influence how this build system is constructed apart from a few keywords that allow developers to specify compiler/linker flags. However, I would prefer to avoid this unless I can know what compiler family is going to be used.
  • I have not found any indication that the modules are guaranteed to be built statically so that we know that we don't have to, for example, distribute with a wheel the numpy library that we used to build the wheel (using delocate or auditwheel for example). To the contrary in compiler output, I recall seeing -dynamic in the output.
  • NEP29 specifies compilers and versions. Perhaps this can be a source of information.

Clone this wiki locally