Skip to content

Conversation

@garrettgu10
Copy link
Owner

@garrettgu10 garrettgu10 commented May 19, 2024

Happy to hop on a call to explain the changes I made, but here's the gist of it:

  1. tail_call_funcs.h contains the meat of the implementation, where all the functions are generated using macros. The DISPATCH_GOTO macro was replaced with one that performs tail calls.
  2. For instrumented instructions, there was originally a goto that would jump to the portion of the base instruction after the bytecode instruction pointer was incremented. This was replaced with a GO_TO_INSTRUTION macro that subtracts from the bytecode pointer, then tail calls to the original instruction.
  3. For error handling, a new CEVAL_GOTO macro was added that first stores the current state to a special struct, then returns an enum (tail call functions return a uintptr_t which is either the returned PyObject* or this enum). On the top level of the _PyEval_EvalFrameDefault, this enum is then matched against the corresponding branch targets within _PyEval_EvalFrameDefault, after restoring the local variables from the special struct (see bottom of tail_call_funcs.h).

I have it building and working on native ARM and WASI, but initial performance tests have been a bit disappointing. On native targets, an easy thing to try would be to build with the latest LLVM 19 and use the preserve_none calling convention to optimize the register allocation and not have to restore register values before tail calling. I care more about Wasm performance, so I will be looking through that first, specifically getting numbers from the V8 side, which is more difficult because of lack of WASI support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant