|
| 1 | +PEP: 774 |
| 2 | +Title: Removing the LLVM requirement for JIT builds |
| 3 | +Author: Savannah Ostrowski < [email protected]> |
| 4 | +Status: Draft |
| 5 | +Type: Standards Track |
| 6 | +Created: 27-Jan-2025 |
| 7 | +Python-Version: 3.14 |
| 8 | + |
| 9 | +Abstract |
| 10 | +======== |
| 11 | + |
| 12 | +Since Python 3.13, CPython has been able to be configured and built with an |
| 13 | +experimental just-in-time (JIT) compiler via the ``--enable-experimental-jit`` |
| 14 | +flag on Linux and Mac and ``--experimental-jit`` on Windows. To build CPython with |
| 15 | +the JIT enabled, users are required to have LLVM installed on their machine |
| 16 | +(initially, with LLVM 16 but more recently, with LLVM 19). LLVM is responsible |
| 17 | +for generating stencils that are essential to our copy-and-patch JIT (see :pep:`744`). |
| 18 | +These stencils are predefined, architecture-specific templates that are used |
| 19 | +to generate machine code at runtime. |
| 20 | + |
| 21 | +This PEP proposes removing the LLVM build-time dependency for JIT-enabled builds |
| 22 | +by hosting the generated stencils in the CPython repository. This approach |
| 23 | +allows us to leverage the checked-in stencils for supported platforms at build |
| 24 | +time, simplifying the contributor experience and address concerns raised at the |
| 25 | +Python Core Developer Sprint in September 2024. That said, there is a clear |
| 26 | +tradeoff to consider, as improved developer experience does come at the cost of |
| 27 | +increased repository size. |
| 28 | + |
| 29 | +It is important to note that this PEP is not a proposal to accept or reject the |
| 30 | +JIT itself but rather to determine whether the build-time dependency on LLVM is |
| 31 | +acceptable for JIT builds moving forward. If this PEP is rejected, we will |
| 32 | +proceed with the status quo, retaining the LLVM build-time requirement. While |
| 33 | +this dependency has served the JIT development process effectively thus far, it |
| 34 | +introduces setup complexity and additional challenges that this PEP seeks to |
| 35 | +alleviate. |
| 36 | + |
| 37 | +Motivation |
| 38 | +========== |
| 39 | + |
| 40 | +At the Python Core Developer Sprint that took place in September 2024, there was |
| 41 | +discussion about the next steps for the JIT - a related discussion also took |
| 42 | +place on `GitHub <https://github.com/python/cpython/issues/115869>`__. As part |
| 43 | +of that discussion, there was also a clear appetite for removing the LLVM |
| 44 | +requirement for JIT builds in preparation for shipping the JIT off by default in |
| 45 | +3.14. The consensus at the sprint was that it would be sufficient to provide |
| 46 | +pre-generated stencils for non-debug builds for Tier 1 platforms and that |
| 47 | +checking these files into the CPython repo would be adequate for the limited |
| 48 | +number of platforms (though more options have been explored; see `Rejected |
| 49 | +Ideas`_). |
| 50 | + |
| 51 | +Currently, building CPython with `the JIT requires LLVM |
| 52 | +<https://github.com/python/cpython/tree/main/Tools/jit#installing-llvm>`__ as a |
| 53 | +build-time dependency. Despite not being exposed to end users, this dependency |
| 54 | +is suboptimal. Requiring LLVM adds a setup burden for developers and those who |
| 55 | +wish to build CPython with the JIT enabled. Depending on the operating system, |
| 56 | +the version of LLVM shipped with the OS may differ from that required by our JIT |
| 57 | +builds, which introduces additional complexity to troubleshoot and resolve. With |
| 58 | +few core developers currently contributing to and maintaining the JIT, we also |
| 59 | +want to make sure that the friction to work on JIT-related code is minimized as |
| 60 | +much as possible. |
| 61 | + |
| 62 | +With the proposed approach, hosting pre-compiled stencils for supported |
| 63 | +architectures can be generated in advance, stored in a central location, and |
| 64 | +automatically used during builds. This approach ensures reproducible builds, |
| 65 | +making the JIT a more stable and sustainable part of CPython's future. |
| 66 | + |
| 67 | +Rationale |
| 68 | +========= |
| 69 | + |
| 70 | +This PEP proposes checking JIT stencils directly into the CPython repo as the |
| 71 | +best path forward for eliminating our build-time dependency on LLVM. |
| 72 | + |
| 73 | +This approach: |
| 74 | + |
| 75 | +* Provides the best end-to-end experience for those looking to build CPython |
| 76 | + with the JIT |
| 77 | +* Lessens the barrier to entry for those looking to contribute to the JIT |
| 78 | +* Ensures builds remain reproducible and consistent across platforms without |
| 79 | + relying on external infrastructure or download mechanisms |
| 80 | +* Eliminates variability introduced by network conditions or potential |
| 81 | + discrepancies between hosted files and the CPython repository state, and |
| 82 | +* Subjects stencils to the same review processes we have for all other JIT-related |
| 83 | + code |
| 84 | + |
| 85 | +However, this approach does result in a slight increase in overall |
| 86 | +repository size. Comparing repo growth on commits over the past 90 days, the |
| 87 | +difference between the actual commits and the same commits with stencils added |
| 88 | +amounts to a difference of 0.03 MB per stencil file. This is a small increase in |
| 89 | +the context of the overall repository size, which has grown by 2.55 MB in the |
| 90 | +same time period. For six stencil files, this amounts to an upper bound of 0.18 MB. |
| 91 | +The current total size of the stencil files for all six platforms is 7.2 MB. |
| 92 | + |
| 93 | +These stencils could become larger in the future with changes to register |
| 94 | +allocation, which would introduce 5-6 variants per instruction in each stencil |
| 95 | +file (5-6x larger). However, if we ended up going this route, there are |
| 96 | +additional modifications we could make to stencil files that could help |
| 97 | +counteract this size increase (e.g., stripping comments, minimizing the |
| 98 | +stencils). |
| 99 | + |
| 100 | +Specification |
| 101 | +============= |
| 102 | + |
| 103 | +This specification outlines the proposed changes to remove the build-time |
| 104 | +dependency on LLVM and the contributor experience if this PEP is accepted. |
| 105 | + |
| 106 | +Repository changes |
| 107 | +------------------ |
| 108 | + |
| 109 | +The CPython repository would now host the pre-compiled JIT stencils in a new |
| 110 | +subdirectory in ``Tools/jit`` called ``stencils/``. At present, the JIT is tested |
| 111 | +and built for six platforms, so to start, we'd check in six stencil files. In |
| 112 | +the future, we may check in additional stencil files if support for additional |
| 113 | +platforms is desired or relevant. |
| 114 | + |
| 115 | +.. code-block:: text |
| 116 | +
|
| 117 | + cpython/ |
| 118 | + Tools/ |
| 119 | + jit/ |
| 120 | + stencils/ |
| 121 | + aarch64-apple-darwin.h |
| 122 | + aarch64-unknown-linux-gnu.h |
| 123 | + i686-pc-windows-msvc.h |
| 124 | + x86_64-apple-darwin.h |
| 125 | + x86_64-pc-windows-msvc.h |
| 126 | + x86_64-pc-linux-gnu.h |
| 127 | +
|
| 128 | +Workflow |
| 129 | +-------- |
| 130 | + |
| 131 | +The workflow changes can be split into two parts, namely building CPython with |
| 132 | +the JIT enabled and working on the JIT's implementation. |
| 133 | + |
| 134 | +Building CPython with the JIT |
| 135 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 136 | + |
| 137 | +Precompiled JIT stencil files will be stored in the ``Tools/jit/stencils`` |
| 138 | +directory, with each file name corresponding to its target triple as outlined |
| 139 | +above. At build time, we determine whether to use the checked in stencils or to |
| 140 | +generate a new stencil for the user's platform. Specifically, for contributors |
| 141 | +with LLVM installed, the ``build.py`` script in ``Tools/jit/stencils`` will allow |
| 142 | +them to regenerate the stencil for their platform. Those without LLVM can rely |
| 143 | +on the precompiled stencil files directly from the repository. |
| 144 | + |
| 145 | +Working on the JIT's implementation (or touching JIT files) |
| 146 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 147 | + |
| 148 | +In continuous integration (CI), stencil files will be automatically validated and updated when changes |
| 149 | +are made to JIT-related files. When a pull request is opened that touches these |
| 150 | +files, the ``jit.yml`` workflow, which builds and tests our builds, will run as |
| 151 | +usual. |
| 152 | + |
| 153 | +However, as part of this, we will introduce a new step that diffs the current |
| 154 | +stencils in the repo against those generated in CI. If there is a diff for a |
| 155 | +platform's stencil file, a patch file for the updated stencil is generated and |
| 156 | +the step will fail. Each patch is uploaded to GitHub Actions. After CI is |
| 157 | +finished running across all platforms, the patches are aggregated into a single |
| 158 | +patch file for convenience. You can download this aggregated patch, apply it |
| 159 | +locally, and commit the updated stencils back to your branch. Then, the |
| 160 | +subsequent CI run will pass. |
| 161 | + |
| 162 | +Reference Implementation |
| 163 | +======================== |
| 164 | + |
| 165 | +Key parts of the `reference implementation <https://github.com/python/cpython/pull/129331>`__ include: |
| 166 | + |
| 167 | +- |CI|_: The CI workflow responsible for generating stencil patches. |
| 168 | + |
| 169 | +- |jit_stencils|_: The directory where stencils are stored. |
| 170 | + |
| 171 | +- |targets|_: The code to compile and parse the templates at build time. |
| 172 | + |
| 173 | +.. |CI| replace:: ``.github/workflows/jit.yml`` |
| 174 | +.. _CI: https://github.com/python/cpython/blob/main/.github/workflows/jit.yml |
| 175 | + |
| 176 | +.. |jit_stencils| replace:: ``Tools/jit/stencils`` |
| 177 | +.. _jit_stencils: https://github.com/python/cpython/blob/main/Tools/jit/stencils |
| 178 | + |
| 179 | +.. |targets| replace:: ``Tools/jit/_targets`` |
| 180 | +.. _targets: https://github.com/python/cpython/blob/main/Tools/jit/_targets.py |
| 181 | + |
| 182 | +Ignoring the stencils themselves and any necessary JIT README changes, the |
| 183 | +changes to the source code to support reproducible stencil generation and |
| 184 | +hosting are minimal (around 150 lines of changes). |
| 185 | + |
| 186 | +Rejected Ideas |
| 187 | +============== |
| 188 | + |
| 189 | +Several alternative approaches were considered as part of the research and |
| 190 | +exploration for this PEP. However, the ideas below either involve |
| 191 | +infrastructural cost, maintenance burden, or a worse overall developer |
| 192 | +experience. |
| 193 | + |
| 194 | +Using Git submodules |
| 195 | +-------------------- |
| 196 | + |
| 197 | +Git submodules are a poor developer experience for hosting stencils because they |
| 198 | +create a different kind of undesirable friction. For instance, any |
| 199 | +updates to the JIT would necessitate regenerating the stencils and committing |
| 200 | +them to a separate repository. This introduces a convoluted process: you must |
| 201 | +update the stencils in the submodule repository, commit those changes, and then |
| 202 | +update the submodule reference in the main CPython repository. This disconnect |
| 203 | +adds unnecessary complexity and overhead, making the process brittle and |
| 204 | +error-prone for contributors and maintainers. |
| 205 | + |
| 206 | +Using Git subtrees |
| 207 | +------------------ |
| 208 | + |
| 209 | +When using subtrees, the embedded repository becomes part of the main |
| 210 | +repository, similar to what's being proposed in this PEP. However, subtrees |
| 211 | +require additional tooling and steps for maintenance, which adds unnecessary |
| 212 | +complexity to workflows. |
| 213 | + |
| 214 | +Hosting in a separate repository |
| 215 | +-------------------------------- |
| 216 | + |
| 217 | +While splitting JIT stencils into a separate repository avoids the storage |
| 218 | +overhead associated with hosting the stencils, it adds complexity to the build |
| 219 | +process. Additional tooling would be required to fetch the stencils and |
| 220 | +potentially create additional and unnecessary failure points in the workflow. |
| 221 | +This separation also makes it harder to ensure consistency between the stencils |
| 222 | +and the CPython source tree, as updates must be coordinated across the |
| 223 | +repositories. Finally, this approach introduces an attack vector, as external |
| 224 | +repositories are less integrated with our workflows, making provenance and |
| 225 | +integrity harder to guarantee. |
| 226 | + |
| 227 | +Hosting in cloud storage |
| 228 | +------------------------ |
| 229 | + |
| 230 | +Hosting stencils in cloud storage like S3 buckets or GitHub raw storage |
| 231 | +introduces external dependencies, complicating offline development |
| 232 | +workflows. Also, depending on the provider, this type of hosting comes with |
| 233 | +additional cost, which we'd like to avoid. |
| 234 | + |
| 235 | +Using Git LFS |
| 236 | +------------- |
| 237 | + |
| 238 | +Git Large File Storage (LFS) adds a tool dependency for contributors, |
| 239 | +complicating the development workflow, especially for those who may not already |
| 240 | +use Git LFS. Git LFS does not work well with offline workflows since files |
| 241 | +managed by LFS require an internet connection to fetch when checking out |
| 242 | +specific commits, which is disruptive for even basic Git workflows. Git LFS has |
| 243 | +some free quota but there are `additional |
| 244 | +costs <https://docs.github.com/en/billing/managing-billing-for-your-products/managing-billing-for-git-large-file-storage/about-billing-for-git-large-file-storage>`__. |
| 245 | +for exceeding that quota which are also undesirable. |
| 246 | + |
| 247 | +Maintain the status quo with LLVM as a build-time dependency |
| 248 | +------------------------------------------------------------ |
| 249 | + |
| 250 | +Retaining LLVM as a build-time dependency upholds the existing barriers to |
| 251 | +adoption and contribution. Ultimately, this option fails to address the core |
| 252 | +challenges of accessibility and simplicity, and fails to eliminate the |
| 253 | +dependency which was deemed undesirable at the Python Core Developer Sprint in |
| 254 | +the fall (the impetus for this PEP), making it a poor long-term solution. |
| 255 | + |
| 256 | +Copyright |
| 257 | +========= |
| 258 | + |
| 259 | +This document is placed in the public domain or under the |
| 260 | +CC0-1.0-Universal license, whichever is more permissive. |
0 commit comments