[rust-compiler] Carry uninspected AST subtrees as raw JSON text#36730
Open
poteto wants to merge 4 commits into
Open
[rust-compiler] Carry uninspected AST subtrees as raw JSON text#36730poteto wants to merge 4 commits into
poteto wants to merge 4 commits into
Conversation
compile_program serialized the compiled File to a JSON string (RawValue) inside core, and the in-process Rust consumers immediately parsed it back: the oxc and swc frontends each did from_str to a Value (to deduplicate the "type" keys the tagged-enum serialization emits beside BaseNode.node_type) and then from_value into File again. Return CompileResult::Success.ast as Option<File> and consume it directly. JSON now exists only at the napi edge, which serializes the whole CompileResult as before with an identical wire shape. Ports the typed-AST patch from the oxc-project fork of this compiler. Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
The AST types held ~105 serde_json::Value fields for subtrees the compiler mostly never inspects: type annotations, class bodies, decorators, enum/interface bodies, parser extras. Replace them with RawNode, an opaque wrapper over the original JSON text. Serialization is verbatim pass-through; deserialization streams the subtree to text via serde_transcode rather than capturing a RawValue, because most nodes sit under #[serde(tag = "type")] enums whose content buffering breaks RawValue's borrowing capture (and serde_json::from_value cannot produce RawValue at all; from_value_via_text covers the few cold paths that deserialize AST types from Values). Consumers that genuinely inspect these subtrees parse on demand: identifier indexing of class bodies, type-annotation lowering and the props-annotation check, the as-const probe and module-interop reverse conversion in the SWC frontend, and the unsupported-node codegen discrimination. The UnknownStatement tolerant deserializer keeps its semantics on the new representation. Binary size is neutral (the transcode path monomorphizes about as much as Value deserialization did); the wins are no retained Value trees in the AST, no duplicate-key dedup dance for in-process consumers, and a single opaque type marking exactly where core stops understanding the tree, so frontends can hand over subtrees in whatever shape they parse.
Two fixes from a four-model adversarial review of the boundary commits, plus one cleanup the review made provable. The napi entrypoint deserializes arbitrarily deep ASTs with serde_json's recursion limit disabled on a 64MB-stack thread, but the tolerant statement path's internal reparses (known-statement dispatch, BaseNode extraction, type_name probes, parse_value) used default-limit from_str, so statements nested past ~128 levels that previously compiled would fail during dispatch. Route every RawNode reparse through from_json_str_unbounded; a regression test deserializes a 400-deep statement on a napi-sized stack. parse_value silently degraded malformed text to Value::Null, which downstream analyses (class-body hook detection, props-annotation classification, unknown-statement reference scans) would read as real content, turning a broken invariant into quietly wrong compile decisions. RawNode text is valid JSON by construction, so parse_value now fails loudly instead. Delete from_value_via_text: the transcode-based Deserialize works under serde_json::from_value (verified by probe), so the helper and its "every from-Value site must use this" contract were vestigial. Call sites return to plain from_value.
is_null had no callers, and the doc comment claimed these subtrees are never inspected; class bodies and type annotations are parsed per use, a cost to cache at call sites if it ever shows in profiles.
This was referenced Jun 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #36729 (upstream rejects cross-fork base branches, so this targets main as a draft; the first commit belongs to the parent PR. Review the last three commits. Will rebase and mark ready when #36729 lands.)
Unmodeled AST subtrees (type annotations, class bodies, unknown statements) were stored as
serde_json::Valuetrees: every node allocated through aMap<String, Value>, and pass-through subtrees were repeatedly traversed by code that never looks inside them. They are nowRawNode, a newtype overBox<RawValue>holding the original JSON text verbatim.Design notes, since two obvious alternatives fail:
RawValuefields break under#[serde(tag = "type")]enums: internally-tagged deserialization buffers content into serde's privateContenttree, whichRawValuecannot read from.RawNode::deserializeinstead streams whatever deserializer it is handed throughserde_transcodeinto a fresh JSON string, which works behind tagged enums,flatten, andfrom_valuealike.RawNodereparse sites usefrom_json_str_unbounded(disables serde_json's 128-level recursion limit, matching how the top-level parse is configured); regression-tested with a 400-deep statement chain.parse_valuefails loudly on malformed text rather than masking corruption withValue::Null; RawNode holds valid JSON by construction.Size-neutral in the shipped binary; the win is structural (no speculative
Valuetrees on the hot path, pass-through subtrees stay untouched text).Verified on this exact tree: cargo workspace tests, both snap channels 1804/1804.