SPIKE: Pipeline trace coverage analysis (do not merge) by cds-amal · Pull Request #130 · runtimeverification/stable-mir-json

cds-amal · 2026-02-28T15:29:41Z

This lives on a spike branch, a few features ahead of what's currently merged. We'll get to those changes in due course; I wanted to document and share what came out of this particular spike before the context fades (things fade fast these days).

What this is (and isn't)

I added pipeline trace instrumentation (gated behind TRACE=1) to the
compiler driver, which logs every event as it flows through the MIR visitor:
each call, allocation, type, and constant it encounters. The output is a
flat JSON array of TraceEvent values, one per observable step, written to
*.smir.trace.json alongside the main output. The implementation lives on
the spike/trace branch.

The point isn't to build a profiler or a debugger. It's to answer a specific
question: when I run the test suite, which visitor paths actually fire? And
(more importantly) which ones don't?

The tracer defines 11 event variants, one for each observable pipeline step:

Event	Visitor callback	What it records
`ItemDiscovered`	Phase 1	Mono item found via `mono_collect` or unevaluated const
`BodyWalkStarted`	Phase 2	Bookend: starting body analysis (with collection snapshot)
`BodyWalkFinished`	Phase 2	Bookend: finished body analysis (with collection snapshot)
`FunctionCallResolved`	`visit_terminator`	Call terminator resolved to a function instance
`DropGlueResolved`	`visit_terminator`	Drop terminator resolved to drop glue
`ReifyFnPointerResolved`	`visit_rvalue`	`Rvalue::Cast(ReifyFnPointer)` detected
`AllocationCollected`	`visit_mir_const`	`ConstantKind::Allocated` provenance walked
`FnDefAsValue`	`visit_mir_const`	`ConstantKind::ZeroSized` FnDef used as value
`UnevaluatedConstDiscovered`	`visit_mir_const`	`ConstantKind::Unevaluated` found
`TypeCollected`	`visit_ty`	Type collected by `TyCollector`
`AssemblyStarted`	Phase 3	Summary: total items, functions, allocs, types, spans

What the test suite actually exercises

I ran TRACE=1 against all 28 pre-existing integration tests and aggregated
the event counts. Here's the full picture:

Count	Event
2121	`SpanResolved`
775	`TypeCollected`
308	`FnDefAsValue`
307	`FunctionCallResolved`
296	`AllocationCollected`
291	`ItemDiscovered`
291	`BodyWalkStarted`
291	`BodyWalkFinished`
59	`DropGlueResolved`
28	`AssemblyStarted`
0	`ReifyFnPointerResolved`
0	`UnevaluatedConstDiscovered`

Two zeros. Two visitor paths that no integration test has ever exercised.

Gap 1: ReifyFnPointer casts (closed)

Path: visit_rvalue handling
Rvalue::Cast(CastKind::PointerCoercion(ReifyFnPointer))

This fires when a function item (which in Rust has a unique zero-sized type;
every function gets its own) is coerced into a fn(...) pointer. It's a
common enough pattern in real code, but none of our test programs happened
to do it.

Straightforward to close. reify-fn-pointer.rs uses an explicit type
annotation to force the cast:

let f: fn(i32, i32) -> i32 = add;

And the trace confirms we hit the path:

{
  "event": "ReifyFnPointerResolved",
  "item": "_ZN16reify_fn_pointer4main17h...E",
  "location": {
    "file": "tests/integration/programs/reify-fn-pointer.rs",
    "line": 14, "col": 34, "end_line": 14, "end_col": 37
  },
  "sym_kind": "normal",
  "sym_name": "_ZN16reify_fn_pointer3add17h...E"
}

Line 14, columns 34-37: that's the add on the right-hand side of the
assignment, exactly where the coercion happens.

Gap 2: unevaluated constants (open)

Path: visit_mir_const handling ConstantKind::Unevaluated

This is trickier. The handler would fire when the compiler leaves a constant
unevaluated in the monomorphized MIR, i.e., it hasn't been resolved to a
concrete value by the time stable MIR sees it. Our handler resolves the
constant's DefId to a mono item and enqueues it for transitive discovery.

I wrote unevaluated-const.rs with associated constants in generic contexts
(S::STEP via trait impls) and const generic parameters
(make_array::<4>()), hoping to produce an unevaluated constant at the MIR
level. On the pinned nightly (2024-11-29), the compiler eagerly evaluates
everything during monomorphization; constants show up as Allocated or
Ty(Value(...)), never Unevaluated. The trace confirms: zero
UnevaluatedConstDiscovered events.

The test exercises const-adjacent pipeline paths and documents the gap, but
doesn't close it.

The open question

ConstantKind::Unevaluated is part of the stable MIR API; the rustc team
put it there, so it's a valid state that consumers must handle. Our handler
is correct: it resolves the constant's DefId to a mono item and enqueues it
for transitive discovery. The question isn't whether to keep it (of course we
keep it), but whether we can actually test it.

On the pinned nightly (2024-11-29), I haven't been able to produce an
Unevaluated constant in monomorphized MIR. Associated constants, const
generics, const expressions in generic contexts: the compiler eagerly
evaluates all of them during monomorphization. I don't know the rustc
internals well enough to say whether there's a specific program shape,
compiler flag, or different nightly version that would leave a constant
unevaluated at this stage. If anyone on the team knows (or knows who to
ask), I'd like to close this gap properly.

In the meantime, the trace gives us a cheap way to track this across
toolchain bumps: run TRACE=1 against the test suite, aggregate events,
check whether UnevaluatedConstDiscovered starts appearing. If a future
nightly changes the evaluation behavior, we'll see it immediately and can
write a targeted test. Until then, the handler is tested against the API
contract (it compiles, it type-checks, it does the right thing with the
ConstantKind::Unevaluated variant), but we don't have an integration test
that actually reaches it end-to-end.

Split the monolithic src/printer.rs (1700+ lines) into focused submodules: - mod.rs: entry points (collect_smir, emit_smir) - schema.rs: serialization types (SmirJson, Item, AllocMap, etc.) - collect.rs: collect_items, collect_and_analyze_items, assemble_smir - items.rs: mk_item, get_item_details, MonoItemKind construction - mir_visitor.rs: BodyAnalyzer (MirVisitor impl), collect_alloc, get_prov_ty - ty_visitor.rs: TyCollector (Visitor impl) - link_map.rs: LinkMap types, update_link_map, fn_inst_sym - types.rs: type helpers (fn_inst_for_ty, has_unresolved_ty, etc.) - util.rs: debug_enabled, debug_log_println, take_any Pure mechanical split; no behavioral changes.

Document the purpose and responsibilities of each submodule in the printer directory: mod.rs (overview table), collect.rs (three-phase pipeline), items.rs (Item construction), link_map.rs (function resolution), mir_visitor.rs (body traversal), schema.rs (data model), ty_visitor.rs (type collection), types.rs (TypeMetadata), and util.rs (helpers).

Replace if-let/else with unwrap_or_else, is_none/unwrap with let-else, remove unnecessary .clone() calls, pass offset by value in collect_alloc, extract opaque_placeholder_ty helper, and add doc comments to public types and key internal structures in schema.rs.

- Use eprintln! instead of println! on error paths to avoid corrupting JSON output when emit_smir targets stdout - Fix uninterpolated {path} in expect message; include actual path and IO error via unwrap_or_else + panic! - Remove redundant .as_str() in LinkMapKey serialization - Replace alloc+sort with single linear scan in field_containing_offset

Concentrate all rustc-internal API calls into src/compat/ so that toolchain upgrades touch one module (plus driver.rs) instead of the whole codebase. New module layout: src/compat/mod.rs - centralized extern crate + re-exports src/compat/bridge.rs - OpaqueInstanceKind, internal conversions src/compat/mono_collect.rs - mono item collection and symbol naming src/compat/spans.rs - span-to-source-location resolution src/compat/types.rs - type queries (generics, fn sigs, attrs) src/compat/output.rs - output filename resolution Key change: replace middle::ty::InstanceKind<'tcx> with an owned OpaqueInstanceKind { debug_repr, is_reify_shim }, eliminating the 'tcx lifetime parameter from SmirJson, LinkMapKey, FnSymInfo, LinkMap, DerivedInfo, and SmirJsonDebugInfo. After this, printer.rs has zero extern crate rustc_* declarations and zero direct tcx.query() calls.

Add ensure_rustc_commit.sh: a helper sourced by both UI test scripts that reads the expected commit from rust-toolchain.toml [metadata] and ensures the rust checkout (regular or bare+worktree) is at that commit. Both run_ui_tests.sh and remake_ui_tests.sh now use RUST_SRC_DIR (set by ensure_rustc_commit.sh) instead of using the raw directory argument directly. CI workflow updated to use yq for TOML parsing. Cherry-picked from d021298 (spike/hex-rustc), adapted for the PR #126 test script rework on this branch.

Callers now go through the compat boundary directly instead of forwarding through local wrappers: - mono_collect: removed from collect.rs, callers import from compat - mono_item_name: removed from util.rs, callers import from compat - has_attr: removed from util.rs, re-exported from lib.rs via compat - def_id_to_inst: removed from util.rs, callers use mono_instance - GenericData newtype: inlined to Vec<(String, String)> - SourceData type alias: removed, callers use compat path directly Adapted from daa0db4 (spike/hex-rustc) for the printer/ directory structure on this branch.

Expand the module-level docs in compat/mod.rs to spell out the rule: nothing outside compat/ (and driver.rs) should touch rustc internals directly. Add a submodule reference table, explain the re-exports, and document each pub use item so cargo doc --open tells a coherent story. Also add a doc comment to SourceData in spans.rs.

Whitespace and import ordering changes from cargo fmt across lib.rs, items.rs, link_map.rs, and util.rs. No behavioral changes.

Add per-callback trace instrumentation to the printer pipeline. When TRACE=1 is set, a *.smir.trace.json file is emitted alongside the main output, capturing a chronological list of TraceEvent values that show how each entry ended up in the output: which item's body was being analyzed, which visitor callback fired, what it received, and what it produced. The tracer is threaded as Option<Tracer> through the pipeline, so it's zero-cost when disabled (None). TyCollector uses a deferred trace buffer (drained by BodyAnalyzer::visit_ty) to sidestep the double-&mut borrow problem between the type visitor and the tracer. New file: src/printer/tracer.rs (TraceEvent enum with 11 variants, Tracer struct, CollectionSnapshot, helper functions). Modified files: - schema.rs: AllocMap::len() for before/after snapshots - ty_visitor.rs: trace_buffer field, TypeCollected emission - mir_visitor.rs: tracer field on BodyAnalyzer, instruments all 5 visitor callbacks (span, terminator, rvalue, mir_const, ty) - collect.rs: threads Option<Tracer> through all three phases, splits collect_smir_traced (internal) from collect_smir (public) - mod.rs: module declaration, TRACE env var, trace file emission

Two new integration tests that close coverage gaps identified by the pipeline trace instrumentation. reify-fn-pointer.rs exercises ReifyFnPointer casts: coercing a function item into a fn(...) pointer via explicit type annotation. This triggers the Rvalue::Cast(ReifyFnPointer) path in visit_rvalue, which was previously unexercised by any test program. unevaluated-const.rs exercises associated constants in generic contexts and const generic parameters. On the current nightly (2024-11-29), the compiler eagerly evaluates all constants during monomorphization, so the ConstantKind::Unevaluated path in visit_mir_const is not triggered. The test still exercises const-adjacent pipeline paths and documents the gap.

Trace files are generated artifacts from TRACE=1 runs and should not be committed.

Each trace event from a MIR visitor callback now carries a SourceLocation with the full source range (file, line, col, end_line, end_col) of the MIR statement that triggered it. This gives visualization tools enough information to highlight the exact source region associated with each pipeline step. Seven TraceEvent variants gain a location field: FunctionCallResolved, DropGlueResolved, ReifyFnPointerResolved, AllocationCollected, FnDefAsValue, UnevaluatedConstDiscovered, and TypeCollected (as Option, stamped by the caller when draining from the TyCollector buffer). Events without a MIR location (ItemDiscovered, BodyWalkStarted/ Finished, SpanResolved, AssemblyStarted) are unchanged.

Two decks in docs/slides/ that walk through TRACE=1 output for the reify-fn-pointer and unevaluated-const test programs. reify-fn-pointer.md: shows how a ReifyFnPointer cast, function calls, and assert_eq! macro expansion each produce trace events with source locations. Demonstrates the side-by-side format (source on left, growing SMIR data on right). unevaluated-const.md: shows how monomorphization splits generic functions into concrete instances, how associated constants and const generics are eagerly evaluated before the pipeline sees them, and how the second monomorphization of the same function body adds zero new types or spans thanks to deduplication.

Both decks now open with a plain-language glossary defining MIR, mono item, allocation, span, terminator, and (in the reify deck) function item vs. function pointer. The const deck defines monomorphization, associated constant, and const generic. Throughout both decks, compiler-internal event names (FunctionCallResolved, AllocationCollected, etc.) are replaced with conversational descriptions ("found a function call," "found compile-time memory"). The event names still appear in the trace JSON; the slides just don't lead with them.

cds-amal added 19 commits February 28, 2026 06:52

cargo fmt

e318f28

cargo fmt

0d9a773

Route mk_graph stable_mir imports through compat module

28b1604

Add rustc commit hash to rust-toolchain.toml for UI tests

e93de17

style: apply cargo fmt formatting fixes

ba4bef6

Whitespace and import ordering changes from cargo fmt across lib.rs, items.rs, link_map.rs, and util.rs. No behavioral changes.

chore: ignore *.smir.trace.json in .gitignore

df615cd

Trace files are generated artifacts from TRACE=1 runs and should not be committed.

cds-amal changed the title ~~Spike/trace~~ SPIKE: Pipeline trace coverage analysis (do not merge) Feb 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPIKE: Pipeline trace coverage analysis (do not merge)#130

SPIKE: Pipeline trace coverage analysis (do not merge)#130
cds-amal wants to merge 19 commits intomasterfrom
spike/trace

cds-amal commented Feb 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cds-amal commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this is (and isn't)

What the test suite actually exercises

Gap 1: ReifyFnPointer casts (closed)

Gap 2: unevaluated constants (open)

The open question

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cds-amal commented Feb 28, 2026 •

edited

Loading