SPIKE: Pipeline trace coverage analysis (do not merge)#130
Draft
SPIKE: Pipeline trace coverage analysis (do not merge)#130
Conversation
Split the monolithic src/printer.rs (1700+ lines) into focused submodules: - mod.rs: entry points (collect_smir, emit_smir) - schema.rs: serialization types (SmirJson, Item, AllocMap, etc.) - collect.rs: collect_items, collect_and_analyze_items, assemble_smir - items.rs: mk_item, get_item_details, MonoItemKind construction - mir_visitor.rs: BodyAnalyzer (MirVisitor impl), collect_alloc, get_prov_ty - ty_visitor.rs: TyCollector (Visitor impl) - link_map.rs: LinkMap types, update_link_map, fn_inst_sym - types.rs: type helpers (fn_inst_for_ty, has_unresolved_ty, etc.) - util.rs: debug_enabled, debug_log_println, take_any Pure mechanical split; no behavioral changes.
Document the purpose and responsibilities of each submodule in the printer directory: mod.rs (overview table), collect.rs (three-phase pipeline), items.rs (Item construction), link_map.rs (function resolution), mir_visitor.rs (body traversal), schema.rs (data model), ty_visitor.rs (type collection), types.rs (TypeMetadata), and util.rs (helpers).
Replace if-let/else with unwrap_or_else, is_none/unwrap with let-else, remove unnecessary .clone() calls, pass offset by value in collect_alloc, extract opaque_placeholder_ty helper, and add doc comments to public types and key internal structures in schema.rs.
- Use eprintln! instead of println! on error paths to avoid corrupting
JSON output when emit_smir targets stdout
- Fix uninterpolated {path} in expect message; include actual path and
IO error via unwrap_or_else + panic!
- Remove redundant .as_str() in LinkMapKey serialization
- Replace alloc+sort with single linear scan in field_containing_offset
Concentrate all rustc-internal API calls into src/compat/ so that
toolchain upgrades touch one module (plus driver.rs) instead of
the whole codebase.
New module layout:
src/compat/mod.rs - centralized extern crate + re-exports
src/compat/bridge.rs - OpaqueInstanceKind, internal conversions
src/compat/mono_collect.rs - mono item collection and symbol naming
src/compat/spans.rs - span-to-source-location resolution
src/compat/types.rs - type queries (generics, fn sigs, attrs)
src/compat/output.rs - output filename resolution
Key change: replace middle::ty::InstanceKind<'tcx> with an owned
OpaqueInstanceKind { debug_repr, is_reify_shim }, eliminating the
'tcx lifetime parameter from SmirJson, LinkMapKey, FnSymInfo,
LinkMap, DerivedInfo, and SmirJsonDebugInfo.
After this, printer.rs has zero extern crate rustc_* declarations
and zero direct tcx.query() calls.
Add ensure_rustc_commit.sh: a helper sourced by both UI test scripts that reads the expected commit from rust-toolchain.toml [metadata] and ensures the rust checkout (regular or bare+worktree) is at that commit. Both run_ui_tests.sh and remake_ui_tests.sh now use RUST_SRC_DIR (set by ensure_rustc_commit.sh) instead of using the raw directory argument directly. CI workflow updated to use yq for TOML parsing. Cherry-picked from d021298 (spike/hex-rustc), adapted for the PR #126 test script rework on this branch.
Callers now go through the compat boundary directly instead of forwarding through local wrappers: - mono_collect: removed from collect.rs, callers import from compat - mono_item_name: removed from util.rs, callers import from compat - has_attr: removed from util.rs, re-exported from lib.rs via compat - def_id_to_inst: removed from util.rs, callers use mono_instance - GenericData newtype: inlined to Vec<(String, String)> - SourceData type alias: removed, callers use compat path directly Adapted from daa0db4 (spike/hex-rustc) for the printer/ directory structure on this branch.
Expand the module-level docs in compat/mod.rs to spell out the rule: nothing outside compat/ (and driver.rs) should touch rustc internals directly. Add a submodule reference table, explain the re-exports, and document each pub use item so cargo doc --open tells a coherent story. Also add a doc comment to SourceData in spans.rs.
Whitespace and import ordering changes from cargo fmt across lib.rs, items.rs, link_map.rs, and util.rs. No behavioral changes.
Add per-callback trace instrumentation to the printer pipeline. When TRACE=1 is set, a *.smir.trace.json file is emitted alongside the main output, capturing a chronological list of TraceEvent values that show how each entry ended up in the output: which item's body was being analyzed, which visitor callback fired, what it received, and what it produced. The tracer is threaded as Option<Tracer> through the pipeline, so it's zero-cost when disabled (None). TyCollector uses a deferred trace buffer (drained by BodyAnalyzer::visit_ty) to sidestep the double-&mut borrow problem between the type visitor and the tracer. New file: src/printer/tracer.rs (TraceEvent enum with 11 variants, Tracer struct, CollectionSnapshot, helper functions). Modified files: - schema.rs: AllocMap::len() for before/after snapshots - ty_visitor.rs: trace_buffer field, TypeCollected emission - mir_visitor.rs: tracer field on BodyAnalyzer, instruments all 5 visitor callbacks (span, terminator, rvalue, mir_const, ty) - collect.rs: threads Option<Tracer> through all three phases, splits collect_smir_traced (internal) from collect_smir (public) - mod.rs: module declaration, TRACE env var, trace file emission
Two new integration tests that close coverage gaps identified by the pipeline trace instrumentation. reify-fn-pointer.rs exercises ReifyFnPointer casts: coercing a function item into a fn(...) pointer via explicit type annotation. This triggers the Rvalue::Cast(ReifyFnPointer) path in visit_rvalue, which was previously unexercised by any test program. unevaluated-const.rs exercises associated constants in generic contexts and const generic parameters. On the current nightly (2024-11-29), the compiler eagerly evaluates all constants during monomorphization, so the ConstantKind::Unevaluated path in visit_mir_const is not triggered. The test still exercises const-adjacent pipeline paths and documents the gap.
Trace files are generated artifacts from TRACE=1 runs and should not be committed.
Each trace event from a MIR visitor callback now carries a SourceLocation with the full source range (file, line, col, end_line, end_col) of the MIR statement that triggered it. This gives visualization tools enough information to highlight the exact source region associated with each pipeline step. Seven TraceEvent variants gain a location field: FunctionCallResolved, DropGlueResolved, ReifyFnPointerResolved, AllocationCollected, FnDefAsValue, UnevaluatedConstDiscovered, and TypeCollected (as Option, stamped by the caller when draining from the TyCollector buffer). Events without a MIR location (ItemDiscovered, BodyWalkStarted/ Finished, SpanResolved, AssemblyStarted) are unchanged.
Two decks in docs/slides/ that walk through TRACE=1 output for the reify-fn-pointer and unevaluated-const test programs. reify-fn-pointer.md: shows how a ReifyFnPointer cast, function calls, and assert_eq! macro expansion each produce trace events with source locations. Demonstrates the side-by-side format (source on left, growing SMIR data on right). unevaluated-const.md: shows how monomorphization splits generic functions into concrete instances, how associated constants and const generics are eagerly evaluated before the pipeline sees them, and how the second monomorphization of the same function body adds zero new types or spans thanks to deduplication.
Both decks now open with a plain-language glossary defining
MIR, mono item, allocation, span, terminator, and (in the
reify deck) function item vs. function pointer. The const
deck defines monomorphization, associated constant, and
const generic.
Throughout both decks, compiler-internal event names
(FunctionCallResolved, AllocationCollected, etc.) are
replaced with conversational descriptions ("found a
function call," "found compile-time memory"). The event
names still appear in the trace JSON; the slides just
don't lead with them.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This lives on a spike branch, a few features ahead of what's currently merged. We'll get to those changes in due course; I wanted to document and share what came out of this particular spike before the context fades (things fade fast these days).
What this is (and isn't)
I added pipeline trace instrumentation (gated behind
TRACE=1) to thecompiler driver, which logs every event as it flows through the MIR visitor:
each call, allocation, type, and constant it encounters. The output is a
flat JSON array of
TraceEventvalues, one per observable step, written to*.smir.trace.jsonalongside the main output. The implementation lives onthe
spike/tracebranch.The point isn't to build a profiler or a debugger. It's to answer a specific
question: when I run the test suite, which visitor paths actually fire? And
(more importantly) which ones don't?
The tracer defines 11 event variants, one for each observable pipeline step:
ItemDiscoveredmono_collector unevaluated constBodyWalkStartedBodyWalkFinishedFunctionCallResolvedvisit_terminatorDropGlueResolvedvisit_terminatorReifyFnPointerResolvedvisit_rvalueRvalue::Cast(ReifyFnPointer)detectedAllocationCollectedvisit_mir_constConstantKind::Allocatedprovenance walkedFnDefAsValuevisit_mir_constConstantKind::ZeroSizedFnDef used as valueUnevaluatedConstDiscoveredvisit_mir_constConstantKind::UnevaluatedfoundTypeCollectedvisit_tyTyCollectorAssemblyStartedWhat the test suite actually exercises
I ran
TRACE=1against all 28 pre-existing integration tests and aggregatedthe event counts. Here's the full picture:
SpanResolvedTypeCollectedFnDefAsValueFunctionCallResolvedAllocationCollectedItemDiscoveredBodyWalkStartedBodyWalkFinishedDropGlueResolvedAssemblyStartedReifyFnPointerResolvedUnevaluatedConstDiscoveredTwo zeros. Two visitor paths that no integration test has ever exercised.
Gap 1: ReifyFnPointer casts (closed)
Path:
visit_rvaluehandlingRvalue::Cast(CastKind::PointerCoercion(ReifyFnPointer))This fires when a function item (which in Rust has a unique zero-sized type;
every function gets its own) is coerced into a
fn(...)pointer. It's acommon enough pattern in real code, but none of our test programs happened
to do it.
Straightforward to close.
reify-fn-pointer.rsuses an explicit typeannotation to force the cast:
And the trace confirms we hit the path:
{ "event": "ReifyFnPointerResolved", "item": "_ZN16reify_fn_pointer4main17h...E", "location": { "file": "tests/integration/programs/reify-fn-pointer.rs", "line": 14, "col": 34, "end_line": 14, "end_col": 37 }, "sym_kind": "normal", "sym_name": "_ZN16reify_fn_pointer3add17h...E" }Line 14, columns 34-37: that's the
addon the right-hand side of theassignment, exactly where the coercion happens.
Gap 2: unevaluated constants (open)
Path:
visit_mir_consthandlingConstantKind::UnevaluatedThis is trickier. The handler would fire when the compiler leaves a constant
unevaluated in the monomorphized MIR, i.e., it hasn't been resolved to a
concrete value by the time stable MIR sees it. Our handler resolves the
constant's DefId to a mono item and enqueues it for transitive discovery.
I wrote
unevaluated-const.rswith associated constants in generic contexts(
S::STEPvia trait impls) and const generic parameters(
make_array::<4>()), hoping to produce an unevaluated constant at the MIRlevel. On the pinned nightly (2024-11-29), the compiler eagerly evaluates
everything during monomorphization; constants show up as
AllocatedorTy(Value(...)), neverUnevaluated. The trace confirms: zeroUnevaluatedConstDiscoveredevents.The test exercises const-adjacent pipeline paths and documents the gap, but
doesn't close it.
The open question
ConstantKind::Unevaluatedis part of the stable MIR API; the rustc teamput it there, so it's a valid state that consumers must handle. Our handler
is correct: it resolves the constant's DefId to a mono item and enqueues it
for transitive discovery. The question isn't whether to keep it (of course we
keep it), but whether we can actually test it.
On the pinned nightly (2024-11-29), I haven't been able to produce an
Unevaluatedconstant in monomorphized MIR. Associated constants, constgenerics, const expressions in generic contexts: the compiler eagerly
evaluates all of them during monomorphization. I don't know the rustc
internals well enough to say whether there's a specific program shape,
compiler flag, or different nightly version that would leave a constant
unevaluated at this stage. If anyone on the team knows (or knows who to
ask), I'd like to close this gap properly.
In the meantime, the trace gives us a cheap way to track this across
toolchain bumps: run
TRACE=1against the test suite, aggregate events,check whether
UnevaluatedConstDiscoveredstarts appearing. If a futurenightly changes the evaluation behavior, we'll see it immediately and can
write a targeted test. Until then, the handler is tested against the API
contract (it compiles, it type-checks, it does the right thing with the
ConstantKind::Unevaluatedvariant), but we don't have an integration testthat actually reaches it end-to-end.