Skip to content

[rust-compiler] Carry uninspected AST subtrees as raw JSON text#36730

Open
poteto wants to merge 4 commits into
react:mainfrom
poteto:lauren/pr-rawnode
Open

[rust-compiler] Carry uninspected AST subtrees as raw JSON text#36730
poteto wants to merge 4 commits into
react:mainfrom
poteto:lauren/pr-rawnode

Conversation

@poteto

@poteto poteto commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Stacked on #36729 (upstream rejects cross-fork base branches, so this targets main as a draft; the first commit belongs to the parent PR. Review the last three commits. Will rebase and mark ready when #36729 lands.)

Unmodeled AST subtrees (type annotations, class bodies, unknown statements) were stored as serde_json::Value trees: every node allocated through a Map<String, Value>, and pass-through subtrees were repeatedly traversed by code that never looks inside them. They are now RawNode, a newtype over Box<RawValue> holding the original JSON text verbatim.

Design notes, since two obvious alternatives fail:

  • Bare RawValue fields break under #[serde(tag = "type")] enums: internally-tagged deserialization buffers content into serde's private Content tree, which RawValue cannot read from. RawNode::deserialize instead streams whatever deserializer it is handed through serde_transcode into a fresh JSON string, which works behind tagged enums, flatten, and from_value alike.
  • Default-limit reparses break deep ASTs: internal RawNode reparse sites use from_json_str_unbounded (disables serde_json's 128-level recursion limit, matching how the top-level parse is configured); regression-tested with a 400-deep statement chain.

parse_value fails loudly on malformed text rather than masking corruption with Value::Null; RawNode holds valid JSON by construction.

Size-neutral in the shipped binary; the win is structural (no speculative Value trees on the hot path, pass-through subtrees stay untouched text).

Verified on this exact tree: cargo workspace tests, both snap channels 1804/1804.

poteto and others added 4 commits June 9, 2026 18:38
compile_program serialized the compiled File to a JSON string (RawValue)
inside core, and the in-process Rust consumers immediately parsed it
back: the oxc and swc frontends each did from_str to a Value (to
deduplicate the "type" keys the tagged-enum serialization emits beside
BaseNode.node_type) and then from_value into File again. Return
CompileResult::Success.ast as Option<File> and consume it directly.
JSON now exists only at the napi edge, which serializes the whole
CompileResult as before with an identical wire shape.

Ports the typed-AST patch from the oxc-project fork of this compiler.

Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
The AST types held ~105 serde_json::Value fields for subtrees the
compiler mostly never inspects: type annotations, class bodies,
decorators, enum/interface bodies, parser extras. Replace them with
RawNode, an opaque wrapper over the original JSON text. Serialization
is verbatim pass-through; deserialization streams the subtree to text
via serde_transcode rather than capturing a RawValue, because most
nodes sit under #[serde(tag = "type")] enums whose content buffering
breaks RawValue's borrowing capture (and serde_json::from_value cannot
produce RawValue at all; from_value_via_text covers the few cold paths
that deserialize AST types from Values).

Consumers that genuinely inspect these subtrees parse on demand:
identifier indexing of class bodies, type-annotation lowering and the
props-annotation check, the as-const probe and module-interop reverse
conversion in the SWC frontend, and the unsupported-node codegen
discrimination. The UnknownStatement tolerant deserializer keeps its
semantics on the new representation.

Binary size is neutral (the transcode path monomorphizes about as much
as Value deserialization did); the wins are no retained Value trees in
the AST, no duplicate-key dedup dance for in-process consumers, and a
single opaque type marking exactly where core stops understanding the
tree, so frontends can hand over subtrees in whatever shape they parse.
Two fixes from a four-model adversarial review of the boundary commits,
plus one cleanup the review made provable.

The napi entrypoint deserializes arbitrarily deep ASTs with serde_json's
recursion limit disabled on a 64MB-stack thread, but the tolerant
statement path's internal reparses (known-statement dispatch, BaseNode
extraction, type_name probes, parse_value) used default-limit from_str,
so statements nested past ~128 levels that previously compiled would
fail during dispatch. Route every RawNode reparse through
from_json_str_unbounded; a regression test deserializes a 400-deep
statement on a napi-sized stack.

parse_value silently degraded malformed text to Value::Null, which
downstream analyses (class-body hook detection, props-annotation
classification, unknown-statement reference scans) would read as real
content, turning a broken invariant into quietly wrong compile
decisions. RawNode text is valid JSON by construction, so parse_value
now fails loudly instead.

Delete from_value_via_text: the transcode-based Deserialize works under
serde_json::from_value (verified by probe), so the helper and its
"every from-Value site must use this" contract were vestigial. Call
sites return to plain from_value.
is_null had no callers, and the doc comment claimed these subtrees are
never inspected; class bodies and type annotations are parsed per use,
a cost to cache at call sites if it ever shows in profiles.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant