Skip to content

Trace starting from RESUME in the JIT #127958

@Fidget-Spinner

Description

@Fidget-Spinner

Feature or enhancement

Proposal:

To maximize traces executed, we should also trace from RESUME in the JIT.

Some things I learnt along the way of implementing this:

  1. We want a threshold for functions more than JUMP_BACKWARDS. Thanks to correspondence from CF Bolz-Tereick, I learnt that PyPy sets function warmup to be 60% higher than loop warmup. IIRC, Luajit sets it to 2x. I chose 2x. This number could use further investigation.

  2. We want to disable the check for recursion, and let the first_instr == instr/trace stack overflow/underflow/out of space handle it. This will allow us to automatically transform recursive functions into an iterative-like form. For example, this is recursive fibonacci on my branch (look at how nice it is!):
    image

  3. We want to avoid compiling short trunk/root traces that don't complete in a loop. The idea is that entering/exiting JIT code is expensive. I chose a trace length of 100 based on some benchmarking I did. This number could use further investigation. This doesn't apply to side/branch traces because they would be coming from other jitted code, as there is no penalty involved entering them (other than a jump).

  4. We need a significantly more exponential backoff than what our current scheme does for function entry. The cost of function entry optimization attempt is high enough that it shows up as a 6% slowdown in bm_coroutines. Thus I'm only making it try-once for now.

  5. We need to trace into function executors too, to avoid shortening the length of loop traces.

Why am I bundling all these in a single PR instead of separating them out for benchmarking? Well without any of the above, RESUME tracing becomes ineffective in some pathological case, and may cause an overall slowdown! We need all of them at the same time for speedups to show!

Preliminary benchmarks on my computer: https://gist.github.com/Fidget-Spinner/8a972a8bcac52d0cf25249564e12d762

All these are partly thanks to Mark's implementation of graphviz executors. Without them I would've never learnt these. I figured these points out by manually inspecting the traces of many benchmarks.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Metadata

Metadata

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions