-
Notifications
You must be signed in to change notification settings - Fork 8
Python packaging notes
jared321 edited this page Feb 4, 2025
·
10 revisions
The following are based on information spread across many different web pages at the time of writing. The contents here are, therefore, not guaranteed to be up-to-date.
AT PRESENT, THIS IS A WORK IN PROGRESS AND CONTENT MIGHT BE WRONG!
- Building a Python package can mean building prebuilt binary wheels or source
distributions with the
buildpackage. It can also refer to installing the package in editable/build mode via pip install (i.e., with the -e flag). - There are many different build backends that a package could use including setuptools.
- Beginning with XXX, it was decided that build processes should be carried out in temporary build system environments that isolate the build process for the user's other Python environments.
- Python packaging was initially tied strongly to the use of setuptools as a
build backend, this included a tight coupling with the use of
setup.pyto specify the structure and building of the package. -
setup.pyoriginally allowed users to specify a package's external dependencies for just the build phase. This is not acceptable since the tools involved in building a package would need to load a setuptoolssetup.pyfile just to determine what backend to use and how to setup a build environment for it. - PEP518 was created and adopted to allow for a tool-agnostic standard for
declaring in a
pyproject.tomlfile the information needed to declare what build system is to be used and how to setup an isolated, temporary build system environment. Related to this, users no longer specify build system requirements insetup.pyfiles. - The
pyproject.tomlfile allows for including all other information that is needed to specify all other packaging aspects for basic packages. - The use of Cython requires that all developers and users building the package
must have Cython installed in their build environment. Therefore, a
nonstandard build system environment is required. Hence, the use of
pyproject.tomlfor specifying at least the build system is required. - The Cython docs suggest that packages that contain .pyx file use
setup.pyto manage the extra steps required to build the package. In this case, use of bothpyproject.tomlandsetup.pywould be required. - The scheme that we have adopted for surmise is to use both but have a clear
role for each file so that these two mechanisms are mininal and decoupled.
- include in
setup.pyall information needed to specify the structure of the package - include in
pyproject.tomlall information needed to build (e.g., the build system specification and automatic setting of version bysetuptools-scm) and potentially distribute the package (e.g., building and testing wheels viacibuildwheel).
- include in
- Both setuptools and Cython suggest distributing the .c files generated by
Cython and including these files in the version control system. By doing
this, developers making changes to the .pyx file are required to Cythonize
their files with a Cython installed in their development environments. They
can then test these new .c files and subsequently commit them. In addition,
the build system and users would not need to install Cython in order to use
the package. For instance, the
setup.pyfile would not specify how to Cythonize any files, but rather would only specify how to include each .c file in the package as a setuptools extension. - If the .c file aren't included in the package, then we are considering that the .pyx files are the main source files and the .c files are created under-the-hood as intermediate files that we can otherwise hide from users. The above suggestions reframe this to understand that the developers decided that C code is needed and that the .c files are then the main source files. Whether these files were created directly by the developers or by Cython is unimportant to the users. Indeed, we could transition from Cython-created .c files to directly developed .c files in these scheme without users knowing.
- One benefit of having only developers engage with Cython and including the .c files in the version control repository is that the developers are in control of what version of Cython is used and all developers and users will use the same version of .c files that were tested by developers of the associated .pyx files. This is good for both quality control as well as working toward reproducibility and maintainability.
- Surmise includes numpy as an external dependence due to its use in general Python code.
- Surmise uses the numpy C API in its .pyx file. In particular, it creates
numpy objects and returns these. This implies that the .c file is linked
against the numpy C interface made available/found at build time. The fact
that we need to provide in
setup.pythe location of the numpy installation's header files emphasizes this. - Due to the combined use of Cython and the numpy C API, there are three
different virtual environments involved in building/distributing/installing
surmise
- a developer environment for cythonizing .pyx files with respect to a target numpy version (I do not know that a target numpy must be implicitly specified as part of the Cythonization process. However, I presently have to install in the venv a numpy version that is compatible with the numpy version used to compile the C code in order to get tests passing.),
- the build system environment that includes the numpy C interface
installation to link against when setuptools builds the compiled .c file
.soextension modules - a developer/user environment in which surmise is installed for use, testing, or development and that includes numpy to satisfy the dependence of the surmise python code including the use of the numpy objects created and returned by the C code.
- It is, therefore, important to match numpy version across these environments without unnecessarily limiting a user's ability to flexibly choose a desired numpy version. In particular, the numpy objects created by the compiled code, which are created in accord with the numpy version available at compile time, should be compatible with any version of numpy that we allow users to install alongside surmise.
- In this spirit of explicit developer control and responsibility, for surmise
we have decided to
- add a
cythonizetask to the tox interface with a fixed version of Cython specified as an external dependence so that developers can easily generate .c files using the same pre-determined Cython version and a fixed version of numpy that Cython should target for its code translation, - specify a fixed version of numpy as a build system requirement in
pyproject.tmolwith that version matching the target version specified in the toxcythonizecommand - include the .c files in the version control repository.
- add a