Skip to content

Isolate rustc internals and make toolchain bumps less hurt#123

Draft
cds-amal wants to merge 11 commits intoruntimeverification:masterfrom
cds-rs:spike/hex-rustc
Draft

Isolate rustc internals and make toolchain bumps less hurt#123
cds-amal wants to merge 11 commits intoruntimeverification:masterfrom
cds-rs:spike/hex-rustc

Conversation

@cds-amal
Copy link
Collaborator

@cds-amal cds-amal commented Feb 21, 2026

This PR builds on #121 (the pipeline refactor) with two goals: make it structurally easy to absorb rustc API changes, and remove the manual bookkeeping around which rustc commit our tests need.

This also addresses Issue #90 and supersedes PR #91 (see "Why this supersedes #91" below).

The compat layer (src/compat/). All usage of rustc internal APIs (rustc_middle, rustc_smir, rustc_span, etc.) is now routed through a single module. When rustc changes its internals (which it does regularly), fixes concentrate in compat/ rather than scattering across the codebase. We stress-tested this against two toolchain bumps (6-month and 13-month jumps); the decision, boundary, and validation results are documented in ADR-003.

This also fixes a gap in mk_graph/, which was using extern crate stable_mir directly (bypassing the abstraction). Those imports now go through compat like everything else.

Single source of truth for the rustc commit. This PR adds a [metadata] section to rust-toolchain.toml with the rustc-commit that backs our pinned nightly:

[metadata]
rustc-commit = "a2545fd6fc66b4323f555223a860c451885d1d2b"

But that created a gap: now the commit hash lives in rust-toolchain.toml and is still hardcoded in .github/workflows/test.yml (the actions/checkout ref for rust-lang/rust). Two sources of truth, one of which will inevitably drift. The fix for this is to make the TOML the single source of truth and have everything else read from it.

(rustup silently ignores unknown keys, so the [metadata] section is safe; it's purely for our scripts and CI.)

What changes

CI (test.yml)

The ui-tests job now installs yq (via mikefarah/yq-action), reads the commit hash from rust-toolchain.toml, and passes it to actions/checkout for rust-lang/rust. No more hardcoded hash in the workflow. If the metadata key is missing or empty, the step fails loudly with a non-zero exit.

Local test scripts (run_ui_tests.sh, remake_ui_tests.sh)

Both scripts now source a shared helper (ensure_rustc_commit.sh) that:

  1. Reads metadata.rustc-commit from rust-toolchain.toml using yq
  2. Detects whether RUST_DIR_ROOT points to a regular clone or a bare repo
  3. For a regular clone: runs git checkout <commit> (or skips if already there)
  4. For a bare repo: creates a detached worktree at $RUST_DIR/<short-hash>/ (or reuses it if one already exists at the right commit)

This means you no longer have to manually check out the right commit before running make test-ui or make remake-ui-tests. Just point RUST_DIR_ROOT at your rust checkout and the scripts handle the rest.

yq (the Go version, mikefarah/yq) is now required for UI tests. If it's missing, the script tells you how to install it and exits.

A note on worktrees for the rust checkout

Rust's repo is enormous. When you inevitably have to switch between commits (toolchain bumps, bisecting, etc.), git checkout in a regular clone rewrites hundreds of thousands of files, blows away build caches, and generally makes you wait while incremental builds start over from scratch. Not great.

A bare repo with worktrees avoids all of this. Each worktree shares the same object store but keeps its own working tree and build artifacts. Switching between commits is just a cd; no files get rewritten, no caches get invalidated. If you've already built at one commit, you can hop to another worktree and come back without losing anything.

To convert an existing rust clone to bare+worktree:

cd ..
mv rust rust-tmp
git clone --bare git@github.com:rust-lang/rust.git rust
cd rust

# Create a worktree for the commit our tests need
git worktree add ./a2545fd6fc66 a2545fd6fc66b4323f555223a860c451885d1d2b --detach

Then point the test scripts at the repo root:

RUST_DIR_ROOT=~/oss/rust make test-ui

The scripts detect the bare repo and create (or reuse) the worktree automatically. When the toolchain bumps to a new nightly, a new worktree gets created at the new commit; the old one sticks around and you can prune it whenever you feel like it.

Why this supersedes PR #91

PR #91 and this PR both address Issue #90 (upgrading the rustc dependency to a recent nightly), but they take fundamentally different approaches.

The core difference: this PR creates an adapter API (src/compat/) that gates all rustc access through a single module, so that API churn is absorbed in one place. PR #91 patches every call site across the codebase directly.

When rustc changes its internal APIs, the cost of adapting depends on how many files touch those APIs. In the codebase as it existed when #91 was written, rustc internals were scattered everywhere: printer.rs, mk_graph/, driver.rs, and various helpers all had their own extern crate declarations and direct imports. Every file that directly calls a rustc API is a file that breaks when that API changes.

PR #91 updates the toolchain from nightly-2024-11-29 to nightly-2025-07-20 and fixes every call site that broke. The result works, but the changes scatter across printer.rs, mk_graph.rs, driver.rs, Cargo.toml, the jq filter, and every golden test file. The next toolchain bump would require the same scattershot patching, because nothing has changed about how the codebase interfaces with rustc.

This PR invests in the adapter layer first. To make the difference concrete, here's what the stable_mir to rustc_public crate rename looks like in each approach:

PR #91's approach: update every file that declares extern crate stable_mir or imports from it: printer.rs, all 5 mk_graph/ files, and driver.rs.

This PR's approach: change one line in compat/mod.rs:

// before
pub extern crate stable_mir;
// after
pub extern crate rustc_public as stable_mir;

Every consumer imports via use crate::compat::stable_mir, so nothing else changes.

This PR doesn't bump the toolchain yet. It lays the groundwork so that the bump, when it lands, is a contained change in compat/ and driver.rs rather than a codebase-wide patch. The spike branches (spike/toolchain-2025-06 and spike/toolchain-2026-01) demonstrate this on real toolchain jumps. PR #91 should be closed once this merges; the upgrade itself can follow as a focused PR that only touches the compat layer.

cds-amal and others added 9 commits February 19, 2026 18:45
The existing integration tests normalize away alloc_ids (via the jq
filter), so they cannot catch the class of bug fixed in f903f21: where
the analysis phase walks a different body than what gets stored in the
output, producing AllocIds that don't correspond to any collected
allocation. There is no red test for that fix.

AllocMap replaces the bare HashMap<AllocId, ...> type alias with a
newtype that, under #[cfg(debug_assertions)], tracks every insertion
and flags duplicates. After the collect/analyze phase completes,
verify_coherence walks every stored Item body with an AllocIdCollector
visitor and asserts that each referenced AllocId exists in the map.
This catches both "walked a stale body" (missing ids) and "walked the
same body twice" (duplicate insertions) at dev time, without any cost
in release builds.

collect_alloc is refactored from the entry() API to contains_key() +
insert() so all mutations flow through AllocMap::insert and get
tracked.

This commit is designed to be rebased before the fix so the assertion
fires on the broken code and goes green after each subsequent commit.
Before
------

The pipeline had three separate passes over MIR bodies, each with
access to the full TyCtxt (and therefore the ability to call
side-effecting rustc queries like inst.body()):

  1. mk_item: calls inst.body(), stores body in Item
  2. collect_unevaluated_constant_items: walks stored bodies to find
     unevaluated consts (separate UnevaluatedConstCollector visitor)
  3. collect_interned_values: walks stored bodies again to find allocs,
     types, spans, and calls (separate InternedValueCollector visitor)

Every phase could freely call inst.body() or other allocating rustc
queries, and nothing in the types prevented it. The alloc-id bug fixed
in the prior commit was a direct consequence: collect_interned_values
called inst.body() a second time, causing rustc to mint fresh AllocIds
that didn't match the ones already embedded in the stored Body. The
same pattern existed in two other places: get_item_details (debug mode)
called inst.body() again, and the MonoItem::Static arm in
collect_interned_values called inst.body() on statics that had no
pre-collected body.

collect_smir itself was a flat script that threaded mutable state
between these phases, including a full HashMap clone (items_clone) just
to remember which items were originally collected.

After
-----

Side-effecting rustc queries (inst.body(), eval_initializer()) are now
confined to a single function: mk_item. Everything downstream operates
on pre-collected data, and the phase boundary is enforced by the type
signatures:

  collect_and_analyze_items(&[Item]) -> (CollectedCrate, DerivedInfo)
  assemble_smir(CollectedCrate, DerivedInfo) -> SmirJson

CollectedCrate holds the items and unevaluated consts (output of rustc
interaction). DerivedInfo holds calls, allocs, types, and spans (output
of body analysis). assemble_smir takes both by value and does pure data
transformation; it structurally cannot call inst.body() because it has
no MonoItem or Instance to call it on.

The two body-walking visitors (InternedValueCollector and
UnevaluatedConstCollector) are merged into a single BodyAnalyzer that
walks each body exactly once. The fixpoint loop for transitive
unevaluated constant discovery is integrated: when BodyAnalyzer finds
an unevaluated const, it records it as a UnevalConstInfo; the outer
loop creates the new Item (the one place inst.body() is called) and
enqueues it.

Static items now store their body in MonoItemKind::MonoItemStatic
(collected once in mk_item), so the analysis phase never needs to go
back to rustc for static bodies either.

The full HashMap clone (items_clone) is replaced by a HashSet of
original item names, which is all the static fixup actually needs.

get_item_details now takes the pre-collected body as a parameter
instead of calling inst.body() independently.

Composability: the previous pipeline was a single monolithic function
where each phase's output was wired directly into the next via local
variables. The new structure exposes CollectedCrate and DerivedInfo as
independent values that can be inspected, tested, or transformed
separately.

Deleted: InternedValueCollector, UnevaluatedConstCollector,
collect_interned_values, collect_unevaluated_constant_items, the
InternedValues type alias, and the items_clone.

All 28 integration tests produce identical output.
Concentrate all rustc-internal API calls into src/compat/ so that
toolchain upgrades touch one module (plus driver.rs) instead of
the whole codebase.

New module layout:
  src/compat/mod.rs          - centralized extern crate + re-exports
  src/compat/bridge.rs       - OpaqueInstanceKind, internal conversions
  src/compat/mono_collect.rs - mono item collection and symbol naming
  src/compat/spans.rs        - span-to-source-location resolution
  src/compat/types.rs        - type queries (generics, fn sigs, attrs)
  src/compat/output.rs       - output filename resolution

Key change: replace middle::ty::InstanceKind<'tcx> with an owned
OpaqueInstanceKind { debug_repr, is_reify_shim }, eliminating the
'tcx lifetime parameter from SmirJson, LinkMapKey, FnSymInfo,
LinkMap, DerivedInfo, and SmirJsonDebugInfo.

After this, printer.rs has zero extern crate rustc_* declarations
and zero direct tcx.query() calls.
Replace the hardcoded commit hash in CI and manual checkout steps with
automated reads from rust-toolchain.toml's [metadata] section. Both
local test scripts and CI now use yq to parse the commit, and the
scripts handle bare repos (via worktrees) and regular clones.

Also converts lessons-learned.md into ADR-001 (compat layer decision).
@cds-amal
Copy link
Collaborator Author

@jberthold , would love your feedback if you have the time.

Copy link
Member

@jberthold jberthold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a quite substantial refactoring, and definitely worth merging.
While possibly the updates to new rustc versions won't happen too frequently, the proposed code has much cleaner structure.
I did not go through the many details of the changes and it is still in draft but this LGTM (very good).

src/printer.rs Outdated
Comment on lines 162 to 163
fn mono_item_name(tcx: TyCtxt<'_>, item: &MonoItem) -> String {
if let MonoItem::GlobalAsm(data) = item {
hash(data).to_string()
} else {
mono_item_name_int(tcx, &rustc_internal::internal(tcx, item))
}
}

fn mono_item_name_int<'a>(tcx: TyCtxt<'a>, item: &rustc_middle::mir::mono::MonoItem<'a>) -> String {
item.symbol_name(tcx).name.into()
crate::compat::mono_collect::mono_item_name(tcx, item)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read a few of these changes that replace a previous helper function with a call into a compat module, like fn blah(args..) -> .. { crate::compat::...::blah(args..) (sometimes with additional struct wrapper like the GenericData above).
Maybe these calls could be made directly through to compat::...?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great catch, @jberthold . I was focused on wrapping rustc calls behind a managed API and missed these, which are very much in the spirit of limiting the blast radius. Thanks for pointing it out.

Consider this the "Spirit of Radio" pass, tightening the signal and cutting the noise. 🎸

Jost rightly pointed out that several functions in printer.rs were just
forwarding to crate::compat::* with no additional logic: generic_data,
has_attr, mono_item_name, def_id_to_inst, and mono_collect. Callers can
(and should) go through the compat boundary directly instead.

Five wrappers removed; call sites now import from compat or call inline.
has_attr moves to a pub re-export from compat::types in lib.rs. Also
drops a stale comment referencing a lifetime-carrying GenericData<'a>
alternative that the compat refactor made irrelevant.
Expand the module-level docs in compat/mod.rs to spell out the rule:
nothing outside compat/ (and driver.rs) should touch rustc internals
directly. Add a submodule reference table, explain the re-exports, and
document each pub use item so cargo doc --open tells a coherent story.

Also add a doc comment to SourceData in spans.rs.
@dkcumming dkcumming self-assigned this Feb 26, 2026
@dkcumming
Copy link
Collaborator

After #121 this will need an update, but I think this is ready to go when that has happened. Looks like a great improvement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants