Skip to content

Commit 4c0b2e8

Browse files
PEP 774: Removing the LLVM requirement for JIT builds (#4234)
1 parent 36b3b39 commit 4c0b2e8

File tree

2 files changed

+261
-0
lines changed

2 files changed

+261
-0
lines changed

.github/CODEOWNERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -650,6 +650,7 @@ peps/pep-0769.rst @facundobatista
650650
peps/pep-0770.rst @sethmlarson @brettcannon
651651
peps/pep-0771.rst @pradyunsg
652652
peps/pep-0773.rst @zooba
653+
peps/pep-0774.rst @savannahostrowski
653654
# ...
654655
peps/pep-0777.rst @warsaw
655656
# ...

peps/pep-0774.rst

Lines changed: 260 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,260 @@
1+
PEP: 774
2+
Title: Removing the LLVM requirement for JIT builds
3+
Author: Savannah Ostrowski <[email protected]>
4+
Status: Draft
5+
Type: Standards Track
6+
Created: 27-Jan-2025
7+
Python-Version: 3.14
8+
9+
Abstract
10+
========
11+
12+
Since Python 3.13, CPython has been able to be configured and built with an
13+
experimental just-in-time (JIT) compiler via the ``--enable-experimental-jit``
14+
flag on Linux and Mac and ``--experimental-jit`` on Windows. To build CPython with
15+
the JIT enabled, users are required to have LLVM installed on their machine
16+
(initially, with LLVM 16 but more recently, with LLVM 19). LLVM is responsible
17+
for generating stencils that are essential to our copy-and-patch JIT (see :pep:`744`).
18+
These stencils are predefined, architecture-specific templates that are used
19+
to generate machine code at runtime.
20+
21+
This PEP proposes removing the LLVM build-time dependency for JIT-enabled builds
22+
by hosting the generated stencils in the CPython repository. This approach
23+
allows us to leverage the checked-in stencils for supported platforms at build
24+
time, simplifying the contributor experience and address concerns raised at the
25+
Python Core Developer Sprint in September 2024. That said, there is a clear
26+
tradeoff to consider, as improved developer experience does come at the cost of
27+
increased repository size.
28+
29+
It is important to note that this PEP is not a proposal to accept or reject the
30+
JIT itself but rather to determine whether the build-time dependency on LLVM is
31+
acceptable for JIT builds moving forward. If this PEP is rejected, we will
32+
proceed with the status quo, retaining the LLVM build-time requirement. While
33+
this dependency has served the JIT development process effectively thus far, it
34+
introduces setup complexity and additional challenges that this PEP seeks to
35+
alleviate.
36+
37+
Motivation
38+
==========
39+
40+
At the Python Core Developer Sprint that took place in September 2024, there was
41+
discussion about the next steps for the JIT - a related discussion also took
42+
place on `GitHub <https://github.com/python/cpython/issues/115869>`__. As part
43+
of that discussion, there was also a clear appetite for removing the LLVM
44+
requirement for JIT builds in preparation for shipping the JIT off by default in
45+
3.14. The consensus at the sprint was that it would be sufficient to provide
46+
pre-generated stencils for non-debug builds for Tier 1 platforms and that
47+
checking these files into the CPython repo would be adequate for the limited
48+
number of platforms (though more options have been explored; see `Rejected
49+
Ideas`_).
50+
51+
Currently, building CPython with `the JIT requires LLVM
52+
<https://github.com/python/cpython/tree/main/Tools/jit#installing-llvm>`__ as a
53+
build-time dependency. Despite not being exposed to end users, this dependency
54+
is suboptimal. Requiring LLVM adds a setup burden for developers and those who
55+
wish to build CPython with the JIT enabled. Depending on the operating system,
56+
the version of LLVM shipped with the OS may differ from that required by our JIT
57+
builds, which introduces additional complexity to troubleshoot and resolve. With
58+
few core developers currently contributing to and maintaining the JIT, we also
59+
want to make sure that the friction to work on JIT-related code is minimized as
60+
much as possible.
61+
62+
With the proposed approach, hosting pre-compiled stencils for supported
63+
architectures can be generated in advance, stored in a central location, and
64+
automatically used during builds. This approach ensures reproducible builds,
65+
making the JIT a more stable and sustainable part of CPython's future.
66+
67+
Rationale
68+
=========
69+
70+
This PEP proposes checking JIT stencils directly into the CPython repo as the
71+
best path forward for eliminating our build-time dependency on LLVM.
72+
73+
This approach:
74+
75+
* Provides the best end-to-end experience for those looking to build CPython
76+
with the JIT
77+
* Lessens the barrier to entry for those looking to contribute to the JIT
78+
* Ensures builds remain reproducible and consistent across platforms without
79+
relying on external infrastructure or download mechanisms
80+
* Eliminates variability introduced by network conditions or potential
81+
discrepancies between hosted files and the CPython repository state, and
82+
* Subjects stencils to the same review processes we have for all other JIT-related
83+
code
84+
85+
However, this approach does result in a slight increase in overall
86+
repository size. Comparing repo growth on commits over the past 90 days, the
87+
difference between the actual commits and the same commits with stencils added
88+
amounts to a difference of 0.03 MB per stencil file. This is a small increase in
89+
the context of the overall repository size, which has grown by 2.55 MB in the
90+
same time period. For six stencil files, this amounts to an upper bound of 0.18 MB.
91+
The current total size of the stencil files for all six platforms is 7.2 MB.
92+
93+
These stencils could become larger in the future with changes to register
94+
allocation, which would introduce 5-6 variants per instruction in each stencil
95+
file (5-6x larger). However, if we ended up going this route, there are
96+
additional modifications we could make to stencil files that could help
97+
counteract this size increase (e.g., stripping comments, minimizing the
98+
stencils).
99+
100+
Specification
101+
=============
102+
103+
This specification outlines the proposed changes to remove the build-time
104+
dependency on LLVM and the contributor experience if this PEP is accepted.
105+
106+
Repository changes
107+
------------------
108+
109+
The CPython repository would now host the pre-compiled JIT stencils in a new
110+
subdirectory in ``Tools/jit`` called ``stencils/``. At present, the JIT is tested
111+
and built for six platforms, so to start, we'd check in six stencil files. In
112+
the future, we may check in additional stencil files if support for additional
113+
platforms is desired or relevant.
114+
115+
.. code-block:: text
116+
117+
cpython/
118+
Tools/
119+
jit/
120+
stencils/
121+
aarch64-apple-darwin.h
122+
aarch64-unknown-linux-gnu.h
123+
i686-pc-windows-msvc.h
124+
x86_64-apple-darwin.h
125+
x86_64-pc-windows-msvc.h
126+
x86_64-pc-linux-gnu.h
127+
128+
Workflow
129+
--------
130+
131+
The workflow changes can be split into two parts, namely building CPython with
132+
the JIT enabled and working on the JIT's implementation.
133+
134+
Building CPython with the JIT
135+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
136+
137+
Precompiled JIT stencil files will be stored in the ``Tools/jit/stencils``
138+
directory, with each file name corresponding to its target triple as outlined
139+
above. At build time, we determine whether to use the checked in stencils or to
140+
generate a new stencil for the user's platform. Specifically, for contributors
141+
with LLVM installed, the ``build.py`` script in ``Tools/jit/stencils`` will allow
142+
them to regenerate the stencil for their platform. Those without LLVM can rely
143+
on the precompiled stencil files directly from the repository.
144+
145+
Working on the JIT's implementation (or touching JIT files)
146+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
147+
148+
In continuous integration (CI), stencil files will be automatically validated and updated when changes
149+
are made to JIT-related files. When a pull request is opened that touches these
150+
files, the ``jit.yml`` workflow, which builds and tests our builds, will run as
151+
usual.
152+
153+
However, as part of this, we will introduce a new step that diffs the current
154+
stencils in the repo against those generated in CI. If there is a diff for a
155+
platform's stencil file, a patch file for the updated stencil is generated and
156+
the step will fail. Each patch is uploaded to GitHub Actions. After CI is
157+
finished running across all platforms, the patches are aggregated into a single
158+
patch file for convenience. You can download this aggregated patch, apply it
159+
locally, and commit the updated stencils back to your branch. Then, the
160+
subsequent CI run will pass.
161+
162+
Reference Implementation
163+
========================
164+
165+
Key parts of the `reference implementation <https://github.com/python/cpython/pull/129331>`__ include:
166+
167+
- |CI|_: The CI workflow responsible for generating stencil patches.
168+
169+
- |jit_stencils|_: The directory where stencils are stored.
170+
171+
- |targets|_: The code to compile and parse the templates at build time.
172+
173+
.. |CI| replace:: ``.github/workflows/jit.yml``
174+
.. _CI: https://github.com/python/cpython/blob/main/.github/workflows/jit.yml
175+
176+
.. |jit_stencils| replace:: ``Tools/jit/stencils``
177+
.. _jit_stencils: https://github.com/python/cpython/blob/main/Tools/jit/stencils
178+
179+
.. |targets| replace:: ``Tools/jit/_targets``
180+
.. _targets: https://github.com/python/cpython/blob/main/Tools/jit/_targets.py
181+
182+
Ignoring the stencils themselves and any necessary JIT README changes, the
183+
changes to the source code to support reproducible stencil generation and
184+
hosting are minimal (around 150 lines of changes).
185+
186+
Rejected Ideas
187+
==============
188+
189+
Several alternative approaches were considered as part of the research and
190+
exploration for this PEP. However, the ideas below either involve
191+
infrastructural cost, maintenance burden, or a worse overall developer
192+
experience.
193+
194+
Using Git submodules
195+
--------------------
196+
197+
Git submodules are a poor developer experience for hosting stencils because they
198+
create a different kind of undesirable friction. For instance, any
199+
updates to the JIT would necessitate regenerating the stencils and committing
200+
them to a separate repository. This introduces a convoluted process: you must
201+
update the stencils in the submodule repository, commit those changes, and then
202+
update the submodule reference in the main CPython repository. This disconnect
203+
adds unnecessary complexity and overhead, making the process brittle and
204+
error-prone for contributors and maintainers.
205+
206+
Using Git subtrees
207+
------------------
208+
209+
When using subtrees, the embedded repository becomes part of the main
210+
repository, similar to what's being proposed in this PEP. However, subtrees
211+
require additional tooling and steps for maintenance, which adds unnecessary
212+
complexity to workflows.
213+
214+
Hosting in a separate repository
215+
--------------------------------
216+
217+
While splitting JIT stencils into a separate repository avoids the storage
218+
overhead associated with hosting the stencils, it adds complexity to the build
219+
process. Additional tooling would be required to fetch the stencils and
220+
potentially create additional and unnecessary failure points in the workflow.
221+
This separation also makes it harder to ensure consistency between the stencils
222+
and the CPython source tree, as updates must be coordinated across the
223+
repositories. Finally, this approach introduces an attack vector, as external
224+
repositories are less integrated with our workflows, making provenance and
225+
integrity harder to guarantee.
226+
227+
Hosting in cloud storage
228+
------------------------
229+
230+
Hosting stencils in cloud storage like S3 buckets or GitHub raw storage
231+
introduces external dependencies, complicating offline development
232+
workflows. Also, depending on the provider, this type of hosting comes with
233+
additional cost, which we'd like to avoid.
234+
235+
Using Git LFS
236+
-------------
237+
238+
Git Large File Storage (LFS) adds a tool dependency for contributors,
239+
complicating the development workflow, especially for those who may not already
240+
use Git LFS. Git LFS does not work well with offline workflows since files
241+
managed by LFS require an internet connection to fetch when checking out
242+
specific commits, which is disruptive for even basic Git workflows. Git LFS has
243+
some free quota but there are `additional
244+
costs <https://docs.github.com/en/billing/managing-billing-for-your-products/managing-billing-for-git-large-file-storage/about-billing-for-git-large-file-storage>`__.
245+
for exceeding that quota which are also undesirable.
246+
247+
Maintain the status quo with LLVM as a build-time dependency
248+
------------------------------------------------------------
249+
250+
Retaining LLVM as a build-time dependency upholds the existing barriers to
251+
adoption and contribution. Ultimately, this option fails to address the core
252+
challenges of accessibility and simplicity, and fails to eliminate the
253+
dependency which was deemed undesirable at the Python Core Developer Sprint in
254+
the fall (the impetus for this PEP), making it a poor long-term solution.
255+
256+
Copyright
257+
=========
258+
259+
This document is placed in the public domain or under the
260+
CC0-1.0-Universal license, whichever is more permissive.

0 commit comments

Comments
 (0)