@@ -95,6 +95,10 @@ the uop interpreter at `tier2_dispatch`, the executor runs the function
9595that ` jit_code ` points to. This function returns the instruction pointer
9696of the next Tier 1 instruction that needs to execute.
9797
98+ The JIT uses platform-specific calling conventions and optimizations:
99+ - On x86-64: Uses the ` preserve_none ` calling convention for efficiency
100+ - On ARM64: Leverages guaranteed tail calls (` musttail ` ) for continuation-passing style
101+
98102The generation of the jitted functions uses the copy-and-patch technique
99103which is described in
100104[ Haoran Xu's article] ( https://sillycross.github.io/2023/05/12/2023-05-12/ ) .
@@ -123,8 +127,118 @@ their implementations do not require changes related to the stencils,
123127because everything is automatically generated from
124128[ ` Python/bytecodes.c ` ] ( ../Python/bytecodes.c ) at build time.
125129
130+ ## Architecture-Specific Implementation
131+
132+ The JIT compiler supports multiple architectures with platform-specific optimizations:
133+
134+ ### Supported Platforms
135+
136+ The JIT currently supports the following target triples:
137+ - ** ARM64/AArch64** : ` aarch64-apple-darwin ` , ` aarch64-pc-windows-msvc ` , ` aarch64-unknown-linux-gnu `
138+ - ** x86-64** : ` x86_64-apple-darwin ` , ` x86_64-pc-windows-msvc ` , ` x86_64-unknown-linux-gnu `
139+ - ** x86** : ` i686-pc-windows-msvc `
140+
141+ ### ARM AArch64 Implementation Details
142+
143+ The ARM64 JIT implementation uses sophisticated instruction patching and relocation techniques:
144+
145+ #### Instruction Encoding and Patching
146+
147+ The JIT manipulates several AArch64 instruction formats (defined in [ ` Python/jit.c ` ] ( ../Python/jit.c ) ):
148+ - ** ADRP** (Address of Page): Used for 21-bit page-relative addressing
149+ - ** LDR/STR** : Load/store with 12-bit immediate offsets
150+ - ** MOV** : Move with 16-bit immediate values
151+ - ** Branch instructions** : 28-bit relative branches
152+
153+ #### Relocation Types
154+
155+ The ARM64 JIT handles multiple relocation types:
156+
157+ 1 . ** 12-bit relocations** (` patch_aarch64_12 ` ): Low 12 bits of addresses, used with LDR/STR and ADD/SUB
158+ 2 . ** 16-bit relocations** (` patch_aarch64_16a/b/c/d ` ): Four-part 64-bit address construction using MOV instructions
159+ 3 . ** 21-bit page relocations** (` patch_aarch64_21r ` ): Page count between current and target pages
160+ 4 . ** 26-bit branch relocations** (` patch_aarch64_26r ` ): Direct branch instructions with ±128MB range
161+ 5 . ** Relaxable relocations** (` patch_aarch64_12x ` , ` patch_aarch64_21rx ` ): Can be optimized to immediate values
162+
163+ #### Trampolines
164+
165+ For branches beyond the 128MB range, the JIT generates trampolines:
166+ ```
167+ ldr x8, [pc + 8] ; Load 64-bit address
168+ br x8 ; Branch to address
169+ .quad target_addr ; 64-bit target address
170+ ```
171+ Each trampoline is 16 bytes on ARM64 (vs. no trampolines needed on x86).
172+
173+ #### GOT Load Relaxation
174+
175+ The JIT optimizes Global Offset Table (GOT) loads when possible:
176+ - Pairs of ADRP + LDR instructions can be relaxed to ADRP + ADD for known addresses
177+ - This optimization (` patch_aarch64_33rx ` ) reduces memory accesses
178+
179+ ### Build Process and Dependencies
180+
181+ #### LLVM Requirement
182+
183+ The JIT requires LLVM 19+ for compilation because:
184+ - ** Clang** is the only C compiler supporting guaranteed tail calls (` musttail ` )
185+ - ** llvm-readobj** is used for extracting object file information
186+ - ** llvm-objdump** provides disassembly for debugging
187+
188+ #### Stencil Generation
189+
190+ The build process ([ ` Tools/jit/build.py ` ] ( ../Tools/jit/build.py ) ):
191+ 1 . Compiles each micro-op implementation to object code using platform-specific flags
192+ 2 . Extracts relocations and symbol information using LLVM tools
193+ 3 . Generates stencils (code templates) in ` jit_stencils.h `
194+ 4 . Platform selection happens at compile time based on target conditions
195+
196+ Platform-specific compilation flags:
197+ - ** aarch64-linux** : ` -fpic -mno-outline-atomics ` (position-independent code, avoid atomic intrinsics)
198+ - ** aarch64-darwin** : Optimizer uses ` OptimizerAArch64 ` class
199+ - ** aarch64-windows** : ` -fms-runtime-lib=dll -fplt ` (DLL runtime, PLT usage)
200+
201+ ### Memory Management
202+
203+ The JIT uses platform-specific memory allocation:
204+
205+ #### Memory Allocation
206+ - ** Unix/Linux** : Uses ` mmap() ` with ` MAP_ANONYMOUS | MAP_PRIVATE `
207+ - ** Windows** : Uses ` VirtualAlloc() ` with ` MEM_COMMIT | MEM_RESERVE `
208+ - ** Page size** : Determined via ` sysconf(_SC_PAGESIZE) ` or ` GetSystemInfo() `
209+
210+ #### Memory Layout
211+ ```
212+ [Executable Code] [Trampolines] [Padding] [Data Section] [Page Padding]
213+ ```
214+ - Code section: Contains emitted machine code
215+ - Trampoline section: Platform-specific size (16 bytes per trampoline on ARM64)
216+ - Data alignment: 8 bytes on ARM64, 1 byte on x86
217+ - Total allocation: Rounded up to page size
218+
219+ #### Protection and Execution
220+ After code emission, memory protection is set:
221+ - Unix: ` mprotect() ` with ` PROT_EXEC | PROT_READ `
222+ - Windows: ` VirtualProtect() ` with ` PAGE_EXECUTE_READ `
223+
224+ ### Optimization Passes
225+
226+ The JIT includes architecture-specific optimizers ([ ` Tools/jit/_optimizers.py ` ] ( ../Tools/jit/_optimizers.py ) ):
227+
228+ #### OptimizerAArch64
229+ - Recognizes ARM64 branch pattern: ` b <target> `
230+ - No branch inversion (unlike x86)
231+ - Focuses on trampoline optimization
232+
233+ #### OptimizerX86
234+ - Handles extensive branch inversion (JE ↔ JNE, etc.)
235+ - Recognizes ` jmp ` and ` ret ` instructions
236+ - More complex control flow optimization
237+
126238See Also:
127239
128240* [ Copy-and-Patch Compilation: A fast compilation algorithm for high-level languages and bytecode] ( https://arxiv.org/abs/2011.13127 )
129241
130242* [ PyCon 2024: Building a JIT compiler for CPython] ( https://www.youtube.com/watch?v=kMO3Ju0QCDo )
243+
244+ * [ ARM64 Instruction Set Reference] ( https://developer.arm.com/documentation/ddi0602/latest/ )
0 commit comments