Skip to content

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Feb 13, 2025

This is a perfect middle-ground between the "large" code model (which we used to use) and the "small" code model (which we currently use):

  • Local data, like OPARG, is encoded directly in the instruction stream (currently they're loaded indirectly).
  • Extern data, like &_PyEval_BinaryOps, is encoded directly in the instruction stream (currently they're loaded indirectly).
  • Local jumps, like _JIT_ERROR_TARGET, use 32-bit jumps (currently they use "relaxable" 64-bit indirect jumps).
  • Extern jumps, like _Py_Dealloc, use "relaxable" 64-bit indirect jumps (same as today).

This only works on one platform, but it's an important one. Looks to be 0.5%-1% faster on benchmarks, as well as a very slight (~0.15%) memory savings due to having to JIT less auxiliary data for storing addresses.

Here's the before-and-after of _LOAD_SMALL_INT:

void
emit__LOAD_SMALL_INT(
    unsigned char *code, unsigned char *data, _PyExecutorObject *executor,
    const _PyUOpInstruction *instruction, jit_state *state)
{
    // 
    // _LOAD_SMALL_INT.o:     file format elf64-x86-64
    // 
    // Disassembly of section .text:
    // 
    // 0000000000000000 <_JIT_ENTRY>:
    // 0: 0f b7 05 00 00 00 00          movzwl  (%rip), %eax            # 0x7 <_JIT_ENTRY+0x7>
    // 0000000000000003:  R_X86_64_GOTPCREL    _JIT_OPARG-0x4
    // 7: c1 e0 05                      shll    $0x5, %eax
    // a: 48 8b 0d 00 00 00 00          movq    (%rip), %rcx            # 0x11 <_JIT_ENTRY+0x11>
    // 000000000000000d:  R_X86_64_REX_GOTPCRELX       _PyRuntime-0x4
    // 11: 48 01 c8                      addq    %rcx, %rax
    // 14: 48 05 f8 36 00 00             addq    $0x36f8, %rax           # imm = 0x36F8
    // 1a: 49 89 45 00                   movq    %rax, (%r13)
    // 1e: 49 83 c5 08                   addq    $0x8, %r13
    // 22: ff 25 00 00 00 00             jmpq    *(%rip)                 # 0x28 <_JIT_ENTRY+0x28>
    // 0000000000000024:  R_X86_64_GOTPCRELX   _JIT_CONTINUE-0x4
    const unsigned char code_body[34] = {
        0x0f, 0xb7, 0x05, 0x00, 0x00, 0x00, 0x00, 0xc1,
        0xe0, 0x05, 0x48, 0x8b, 0x0d, 0x00, 0x00, 0x00,
        0x00, 0x48, 0x01, 0xc8, 0x48, 0x05, 0xf8, 0x36,
        0x00, 0x00, 0x49, 0x89, 0x45, 0x00, 0x49, 0x83,
        0xc5, 0x08,
    };
    // 0: OPARG
    // 8: &_PyRuntime+0x0
    patch_64(data + 0x0, instruction->oparg);
    patch_64(data + 0x8, (uintptr_t)&_PyRuntime);
    memcpy(code, code_body, sizeof(code_body));
    patch_32r(code + 0x3, (uintptr_t)data + -0x4);
    patch_x86_64_32rx(code + 0xd, (uintptr_t)data + 0x4);
}
void
emit__LOAD_SMALL_INT(
    unsigned char *code, unsigned char *data, _PyExecutorObject *executor,
    const _PyUOpInstruction *instruction, jit_state *state)
{
    // 
    // _LOAD_SMALL_INT.o:     file format elf64-x86-64
    // 
    // Disassembly of section .text:
    // 
    // 0000000000000000 <_JIT_ENTRY>:
    // 0: 48 b8 00 00 00 00 00 00 00 00 movabsq $0x0, %rax
    // 0000000000000002:  R_X86_64_64  _JIT_OPARG
    // a: 0f b7 c0                      movzwl  %ax, %eax
    // d: c1 e0 05                      shll    $0x5, %eax
    // 10: 48 b9 00 00 00 00 00 00 00 00 movabsq $0x0, %rcx
    // 0000000000000012:  R_X86_64_64  _PyRuntime+0x36f8
    // 1a: 48 01 c1                      addq    %rax, %rcx
    // 1d: 49 89 4d 00                   movq    %rcx, (%r13)
    // 21: 49 83 c5 08                   addq    $0x8, %r13
    // 25: e9 00 00 00 00                jmp     0x2a <_JIT_ENTRY+0x2a>
    // 0000000000000026:  R_X86_64_PLT32       _JIT_CONTINUE-0x4
    const unsigned char code_body[37] = {
        0x48, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x0f, 0xb7, 0xc0, 0xc1, 0xe0, 0x05,
        0x48, 0xb9, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x48, 0x01, 0xc1, 0x49, 0x89, 0x4d,
        0x00, 0x49, 0x83, 0xc5, 0x08,
    };
    memcpy(code, code_body, sizeof(code_body));
    patch_64(code + 0x2, instruction->oparg);
    patch_64(code + 0x12, (uintptr_t)&_PyRuntime + 0x36f8);
}

@brandtbucher brandtbucher added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-JIT labels Feb 13, 2025
@brandtbucher brandtbucher self-assigned this Feb 13, 2025
@bedevere-app bedevere-app bot mentioned this pull request Feb 13, 2025
13 tasks
Copy link
Member

@savannahostrowski savannahostrowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is neat! Less indirection. LGTM!

@brandtbucher brandtbucher merged commit 5d8db36 into python:main Mar 5, 2025
63 checks passed
seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-JIT

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants