Discussion on Trace Based Just-in-Time Specialization for Dynamic Languages #620

jku20 · 2025-11-11T19:59:37Z

jku20
Nov 11, 2025

Reading: Trace Based Just-in-Time Specialization for Dynamic Languages
Discussion Leads: Ann Zhang (@az275), Jeremy Ku-Benjet (@jku20), Sunwoo Kim (@sunwookim028)

jeffreyqdd · 2025-11-12T10:30:31Z

jeffreyqdd
Nov 12, 2025

This paper proposes a JIT strategy that records hot loop paths at runtime and compiles them into native code with type specialization. In traditional tracing-JITs, nested loops are complex to optimize because a naive implementation may very easily turn into tail-duplication, which can easily overflow the code cache. The innovation in this paper is to use a tracing tree, which keeps the structure relatively flat. A tracing-JIT approach also provides relatively cheap inter-procedural optimization because the traces naturally cross function boundaries.

Because trace monkey is "loop-optimized" and type-specialized, it works well in code that is type-stable and has predictable loops. I'm particularly worried about how Trace Monkey would perform in code that depends on many types that make heavy use of dynamic dispatch. I'm worried that the cost of guards and compiling individual type-specialized traces would be detrimental to the program's performance.

In modern tiered JIT systems, how can we strike the right division of labor between a tracing JIT and a method JIT? Can the optimization of one JIT type be used as additional information for the other?

0 replies

mercure67 · 2025-11-15T18:14:19Z

mercure67
Nov 15, 2025

critique

It seems like this paper is a natural follow-on from the SELF paper, presenting a much more sophisticated version of the JIT features discussed in the SELF paper, while keeping stuff like the "property maps" used to obtain class information.

It's also very definitely an engineering-oriented paper (unsurprising considering it came out of Mozilla) --- the researchers are very interested in providing an implementation more than theory. Historically, I'm curious whether anybody had tried to JIT JS before, and whether V8 /JSCore lifted ideas from this paper.

There's also something of a comparch-esque concept in here of speculation and rollback --- they're essentially doing branch prediction for types (and also making use of it to accelerate loop code).

I'm a little sceptical of their evaluation --- small programs is fine, but I wonder if they're measuring results on programs that look like what real JS is used for. I suppose that will vary, but if (for example) extensive DOM manipulation is taking place in the JS, does the code performance improvement look as good?

questions
The authors mention briefly loops with varying types. Is this something that they only need to handle because of JS, or is it more broadly an issue with dynamic languages? Is there some way of designing the language to avoid these situations, and would it hurt productivity to do so?

One of the benchmarks is 25x faster than comparative implementations --- why?

How could the startup time / overhead of tracing be improved?

1 reply

sampsyo Nov 15, 2025
Maintainer

Sure, here's a little bit of historical framing around this:

Historically, I'm curious whether anybody had tried to JIT JS before, and whether V8 /JSCore lifted ideas from this paper.

This paper was part of a moment in time when everyone suddenly started writing efficient JavaScript compilers all at once. The events are so close in time that it's hard to say exactly who was first. Some data points:

TraceMonkey seems to have been announced on August 23, 2008
SquirrelFish Extreme, which I believe was JavaScriptCore's first use of native code generation, was announced on September 18, 2008, after the just-an-efficient-interpreter SquirrelFish design had just recently been detailed on June 2 (I remember that blog post as igniting the whole chain of dominoes here)
Google Chrome, which at the time used WebKit but with a new JavaScript compiler called V8, was announced in a comic book (!) on approximately September 2, 2008
Microsoft took quite a bit longer: the Chakra JIT didn't ship until IE 9. here's an announcement I found from March 18, 2010

So anyway, the point here is that something was in the water in the span of August–September 2008, and 3 JavaScript JITs emerged almost simultaneously. TraceMonkey was one of them. This simultaneous invention led to a roughly decade-long arms race among the 3 (and sometimes 4) JavaScript compilers. If you poke around, you can find many entertaining blog posts from the compiler engineers at the time eagerly comparing benchmarks as they leapfrogged each other. It was a fun time!

Jacqueline-Wen · 2025-11-17T21:22:58Z

Jacqueline-Wen
Nov 17, 2025

Critique

This paper introduces TraceMonkey, the first trace-based JIT compiler for JavaScript. TraceMonkey identifies hot paths through loops and compiles them into fast type-specialized machine code.

I found this paper pretty interesting to read. Particularly, I found solving the problem of nested loops with independent tree traces with inner loop calls and type map matching was especially clever. I do appreciate that this paper isn’t pure theory, and they actually implemented TraceMonkey to test the practicality of their theory. However, I felt that TraceMonkey is quite limited in handling JS language features. Notably, it doesn’t support recursion, which I feel like is very widely used across most languages.

Questions

In which cases would method-based JIT compilation be better than trace-based JIT compilation? What are the tradeoffs?
Which modern language features might most benefit from trace-based JIT compilation?

0 replies

maheshejs · 2025-11-17T22:19:44Z

maheshejs
Nov 17, 2025

Critique
This was an interesting read, presenting a compilation technique for dynamically typed languages that records hot traces in loops at runtime and generates type-specialized native code. I like how detailed the paper is: they identify limitations of a basic tracing approach (nested loops, exponential trace duplication) and propose concrete solutions. I found really impressive the engineering effort to integrate all these ideas into a working, correct system. However, sometimes it felt like reading multiple papers in one. It would've helped to have the engineered parameters, like the hot loop definition (2 iterations) or backoff threshold (32 failures), into a separate section and explain how those values were chosen.

Question
Using this technique, they achieve the fastest performance on 9 of the 26 benchmarks. Among limiting factors is not handling recursion and I'm curious how modern trace-based VMs handle recursive functions. On a different note, I'm wondering why SunSpider, the industry-standard JavaScript benchmark suite at the time, that they use consists of short-running programs. Is it uncommon to evaluate VMs on long-running workloads?

0 replies

Mond45 · 2025-11-18T00:00:22Z

Mond45
Nov 18, 2025

Critique
I find the paper quite easy to follow, and I think the idea of a tracing compiler for dynamic languages like JavaScript is quite intuitive. I like that the authors thoroughly address multiple potential issues, such as duplicated trace trees or number mis-speculation. At the same time, the approach seems to require so many techniques to handle these pathological cases, for example, the oracle for integer/double mis-speculation or blacklisting, which makes the implementation feel unnecessarily complicated.

I also agree with the other comment regarding how the number 2 was chosen as the hotness threshold. Also, I wonder how valid the assumption is that most loops are type-invariant in real programs. Lastly, I do feel that the performance and implementation cost imposed by dynamic languages like JavaScript may not be worth the supposed ease of programming. In fact, many of the issues discussed in the paper could have been avoided just by having proper static typing.

Discussion Questions

Is tracing-based JIT worth it compared to method JITs for real programs? Are real programs' behaviors "regular" enough that each loop iteration is relatively invariant, making the cost of tracing compilers worthwhile?
The paper mentions that background recompilation on multicore systems could improve JIT performance. How much of this is incorporated into modern JITs nowadays?
Do modern JITs leverage typing information from TypeScript to help guide their compilation?

0 replies

YoruCathy · 2025-11-18T02:15:10Z

YoruCathy
Nov 18, 2025

CRITIQUE
The paper convincingly demonstrates large speedups on type-stable loops, but it provides limited analysis of cases where tracing performs poorly, particularly highly polymorphic, branch-heavy, or recursion-heavy workloads. These are common patterns in real JavaScript, yet the evaluation does not quantify how often guards fail, how frequently traces abort, or how much overhead nested tracing and blacklisting introduce. Without this, it is hard to judge the generality and robustness of the approach outside curated benchmarks.

QUESTION
How would TraceMonkey’s performance change on programs with frequent type instability or unpredictable control flow, and could more adaptive guard or trace-selection strategies reduce the number of aborts and blacklisted loops?

0 replies

natetyoung · 2025-11-18T02:41:12Z

natetyoung
Nov 18, 2025

Critique
This paper presents the design and implementation of TraceMonkey, Mozilla's trace-oriented JIT for javascript. It has a lot of detail, but I found the paper surprisingly easy to follow at least for the gist of the system: essentially, it detects hot loops (although it has a strange definition of "hot" -- just 2 iterations) and traces the next path (including type information) taken through the loop body, then compiles that trace so it can be called directly later. All later calls to the loop body attempt to use this compiled trace, and anything which makes it invalid (type mismatch, exception, different control flow, etc) results in the compiled trace being stopped and the interpreter resuming, via guard statements placed in the compiled trace. TraceMonkey has special optimizations for jumps out of traces to other hot traces, and for nested hot loops.
Quick note: they describe their process of compiling a trace with whatever type information and control flow they see, with the hope that it will remain the same in other loop iterations, as "speculating," but as far as I can tell they do no actual speculative execution of instructions.

Questions

One really neat thing about this approach is that since they are only compiling single traces, all the programs they need to compile are straightline. This makes their compiler very simple, and presumably helps them with some optimizations as well. What are the tradeoffs here? What is the "right" amount of control flow to include in compiled snippets for a JIT?
More philosophically: with the hardware considered, where do JITs like these get their performance from? At a first glance, one answer could be that it does less control, but while compiled snippets do not have the control overhead of picking where values should come from and what to execute based on dynamic type, they do still have the control overhead of guarding the execution when something surprising happens, and the system as a whole still has to pick a snippet based on dynamic type. Is this less total control? Is some of this control being "accelerated" by pieces of the hardware which are better at it (branch predictor, cache...)? In other words, what (if anything) distinguishes JITs from heavily optimized interpreters in a way that matters to the hardware for performance purposes?

0 replies

Smubge · 2025-11-18T02:53:23Z

Smubge
Nov 18, 2025

Critique

This paper demonstrates the implementation of TraceMonkey. The paper is seemingly straightforward and easy to folow. Something I'm very interested in are the failure cases and blacklisting. If a trace repeatedly cannot be recorded, it will blacklist it and replace the loop header to avoid monitoring it again. Overall, I believe this is a ingenious paper that has had a lasting impact on the computer science world. On researching its impact, I found that it has directly influenced V8, SpiderMonkey, PyPy, and LuaJit.

Questions

Why is it that in modern day, that some engines(like SpiderMonkey for instance) have seemed to move away from pure tracing JITs?
What happens if we use untyped SSA?

0 replies

magg1egao · 2025-11-18T04:21:25Z

magg1egao
Nov 18, 2025

Critique:
This paper discusses a trace-based compilation technique. I liked how the authors discussed a bit about the modern real-world scenarios. They brought up browser-based productivity applications, such as Google Mail and Google Docs, and how dynamic languages make more sense for this type of use case.

One assumption this paper makes is that it expects hot loops to be mostly type-stable. In my opinion, this seems like quite a bit of an assumption to make. It also brings up the question of how impactful these optimizations are to modern workloads. The paper brings up JavaScript as a dynamic language that is used frequently for the highly interactive web browser environment. I am curious whether this approach adapts well to modern JavaScript workloads where loops are likely less explicit and type stability is harder to take advantage of. It would be cool to see if such a tracing system could adapt to a kind of event-driven hot path that is not statically a loop structured component.

Question:
Is a trace-based model still a relevant model in the modern world that is dominated by async event loops rather than nested loops?
The authors make assumptions such as stable types, predictable paths, and frequent loops. Are these assumptions too much when thinking about modern applications?

0 replies

zc579 · 2025-11-18T04:29:26Z

zc579
Nov 18, 2025

Critique
The paper presents an innovative and influential tracing-based JIT design that significantly improves performance on hot, predictable execution paths, but its effectiveness is limited because tracing breaks down on highly dynamic or irregular JavaScript workloads, leading to excessive guard failures and reduced practical applicability.
Question
Can a trace-based JIT approach remain competitive today, or is its usefulness fundamentally limited compared to SSA-based optimizing JITs and hybrid tiered architectures?

0 replies

SolidLao · 2025-11-18T06:41:53Z

SolidLao
Nov 18, 2025

Critique

Dynamic languages are hard to be optimized at compile time because the type information is unknown. This paper presents a technique, that starts with fast-starting interpreter, then identifies hot executed loop traces at runtime and generates specialized machine code for these traces.

Question

The paper optimizes individual loops based on two expectations: programs spend most of time in hot loops, and hot loops are always type-stable. Are these two expectations always the case in practice? Is there any scenario where the two expectations fail?
How much overheads do the tracing introduce? In which situation will the overheads exceed the benefits?

0 replies

SerenaYZhang · 2025-11-18T07:35:47Z

SerenaYZhang
Nov 18, 2025

Critique

This paper presents TraceMonkey, which is a trace-based JIT compiler designed to accelerate dynamic languages like JavaScript. However, despite its many optimizations, it seems that Mozilla eventually replaced TraceMonkey with JägerMonkey. Since TraceMonkey is a trace-based JIT, it struggles with programs with many branches and complex control flow, so JägerMonkey, a method-based JIT, became more favorable.

Questions

The paper explicitly notes recursion as a limitation and marks it as future work. What are the fundamental architectural challenges in making a trace-based system efficiently handle deep recursion? Would it require a completely different approach, like also incorporating a method-based JIT?

0 replies

NingWang0123 · 2025-11-20T06:56:26Z

NingWang0123
Nov 20, 2025

Critique
The trace-based approach nicely exploits the fact that many dynamic language programs spend most of their time in a few hot loops with stable types, and the guard-and-bailout mechanism is conceptually simple yet powerful. However, the design can struggle with highly polymorphic or unpredictable code, where traces fragment and guard failures become frequent, and it inherently favors loop-centric patterns over code whose hot paths don’t fit well into linear traces.

Questions
How robust is the tracing strategy on real-world workloads with lots of object polymorphism or highly data-dependent control flow?

0 replies

az275 · 2025-11-21T01:04:37Z

az275
Nov 21, 2025

Figured I'd leave a comment here given the schedule switch up this week.

I have touched very little JavaScript in my life and intend to keep it that way. That said, I appreciate that this paper was super concrete and super detailed, and I found the nested trace tree formation particularly cool.

My main critique here is that, as others have pointed out, the effectiveness of trace-based JIT is heavily dependent on whether programs adhere to assumptions. I suppose the effectiveness of any strategy is program-dependent, but it seems like there are a lot of cases where trace-based JIT doesn't work well, and this is somewhat unsatisfactorily addressed. They do give some reasons in the evaluation section on why speedups are smaller on some benchmarks, e.g. recursion; however, they say very little on whether it is feasible/what would have to be done to address these (pretend I phrased this as a question). The paper also employs a "blacklisting" strategy for programs that do not trace well; it would have been nice to see some more context on the characteristics of those programs. Do a lot of modern programs not trace well? What does that suggest about whether and where tracing JITs are relevant today?

0 replies

tf-mac · 2025-11-26T23:59:52Z

tf-mac
Nov 26, 2025

I was pretty delayed on this because I fell badly under the weather Monday and got busy with other work after then, so my apologies on the delay.

Critique:

I thought the paper was interesting if limited. The authors specifically focused on how to efficiently trace loops, and importantly loops that may be nested (which may avoid issues of outer loops failing to be traced because they are less "hot"). They did this by "separating" the CFG and detecting nested loops, letting outer loops "call" inner loops. The insight seems fairly simple, but I'm very impressed by the results. My only critique is it seems that the initial results may be a fraction of possible speedups due to the various limitations on what could be traced, but certainly this is a compliment to the paper and shows its potential.

Discussion Questions:

One of the limitations mentioned was that TraceMonkey does not compile paths with exceptions, under the assumption that exceptions are rare in JavaScript. Would there be any way to fix this and other limitations? For example, if an exception is always caught, could the try catch block be traced a whole, thus refactoring to avoid the issue?

0 replies

Discussion on Trace Based Just-in-Time Specialization for Dynamic Languages #620

Uh oh!

Replies: 15 comments · 1 reply

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sampsyo Nov 15, 2025 Maintainer

Uh oh!

Critique

Questions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Critique

Questions

Uh oh!

Uh oh!

Uh oh!

Critique

Question

Uh oh!

Critique

Questions

Uh oh!

Uh oh!

Uh oh!

Replies: 15 comments 1 reply

sampsyo Nov 15, 2025
Maintainer