Feature-complete RR #5

arjunr2 · 2025-11-14T22:34:41Z

Notes

Support for all entrypoints through Func, TypedFunc, and ComponentFunc
Support for all component builtins
Support for all instantiation entrypoints
Support for both sync and async
Fully embedding agnostic replay driver

Structure

All core capabilities are feature-gated with the rr feature. The only things exposed without the feature are rr::hooks, which offers convenient methods to hook recording/replaying mechanisms into the rest of the engine.
As a result, rr without hooks should be able to land with no impact on existing Wasmtime whatsoever.

Observed overheads

Between 4-8%

Helps get some binaries to test with

Publish dev artifacts from CI

Test out the merge queue

Move RR

…hooks into a separate module

…r in future

…events

…ents

Things left to complete: * Core wasm support * Improved error handling * Wasmtime CLI integration

… completed core wasm rr

cfallin · 2025-12-03T22:35:55Z

I left one comment in the reentrancy commit re: lifetimes (can we try harder to avoid the unsafe, basically; happy to help think through this). More generally though I'm not sure I understand the nested event loop thing in the return hook -- I mean, I understand how architecturally one could be forced into it, but ideally I'd want replay to be one loop at one level -- that also makes eventual trace seeking (which we'll need for reversible debugging) feasible, versus having to have a nested series of stack frames in certain states.

arjunr2 · 2025-12-04T18:44:17Z

Yeah, this is challenging, I thought quite a bit on how to get re-entrancy to work and had settled on this.

The unsafe doesn't really have to do with lifetimes. It's that to accomplish re-entrancy, the replay stub for the host functions need access to the ReplayInstance which created it. The way I accomplish this right now is tying a special ReplayHostContext to the Store, that is fully opaque to users and only creatable by the ReplayInstance. This context allows us to access instance state and run the appropriate re-entrancy call. So basically, if we replay, we can assume the data has to be a ReplayHostContext, which is what the unsafe is doing. The other option to do this is to potentially have a field in the Store that has a weak pointer to a ReplayInstance, and access things through that. Maybe this works better?

As for replay being one loop at one level, is it possible to do that? I'm not sure how that's even possible with re-entrancy, because there is no getting around actually invoking the wasm function from the return hook right? The "loop" in the return hook is just because we can have re-entrancy arbitrary levels deep. The only thing it expects is a balanced set of entry/return events.

cfallin · 2025-12-04T18:46:42Z

FYI, mentioned this PR to Alex today in the Wasmtime biweekly and he will take a look sometime in the next few weeks -- cc @alexcrichton.

cfallin · 2025-12-04T19:00:22Z

As for replay being one loop at one level, is it possible to do that? I'm not sure how that's even possible with re-entrancy, because there is no getting around actually invoking the wasm function from the return hook right? The "loop" in the return hook is just because we can have re-entrancy arbitrary levels deep. The only thing it expects is a balanced set of entry/return events.

Thinking more about this, it seems like a very core challenge for reversible debugging. We fundamentally need to snapshot and restore to earlier execution states. Our plan-of-record for the active stack frames has been to actually memcpy out the fiber stack (the active parts of it, down to saved SP) and copy it back in; if no pointers move, then everything is still valid.

But if we have native frames between Wasm activations, all that goes right out. There's no safe way at all to restore a set of native frames that core Wasm called out to, that called back into Wasm.

Maybe we don't support reentrancy during record or replay; maybe the component model got this right. (Side-eyeing upcoming changes to support reentrancy that I've heard rumblings about -- I don't know any details.) In this case, we'll want to (I think) trap on attempted re-entry into Wasm when recording...

arjunr2 · 2025-12-04T19:05:50Z

Actually yeah, I just realized, it's more than native frames, we'd actually need native snapshots between activations (which ends up being just like RR!) since Store/Instance state can be modified too.

no_std

…ust disable async with component-model-async

cfallin · 2025-12-18T19:01:01Z

@arjunr2 I see your new commits here while waiting for review from Alex. Quick question: what's the story on the new name wasm-crimp?

arjunr2 · 2025-12-18T19:13:11Z

crimp (name is a work in-progress if you have suggestions :)) is now just a new crate that contains the record/replay interface specification. This was just a refactor that decouples the interface specification from wasmtime itself, so I write replay interpreter and recording embedders for the same crimp spec in different engines. My plan is to write a replayer in Wizard now with this.

But essentially in terms of logic, it didn't change anything at all. It was purely a move of almost everything in rr/core/* within wasmtime to a separate crate.

arjunr2 · 2025-12-18T19:16:44Z

Also crimp currently does depend on a few things in wasmtime-environ (for things like checksum, component id, etc.). That's ok for now, and probably ok long term as well, but maybe it'd make more sense to live without any dependency on environ.

I also think this crate should live as a separate repo (under the bytecode alliance maybe, or my personal account), but I figure we can wait till review is complete to do that.

cfallin · 2025-12-19T17:46:05Z

Ah, I see. IMHO that's a bigger discussion whether we want to export a set of types that other engines also use -- that's a big new public API surface.

Would you mind reverting that for now on this branch at least? I want to try to keep the review target stable, without new development, until we land it.

Speaking of which -- @alexcrichton I know you're extremely busy and the holidays are coming up but do you have any estimate on when you might be able to give this PR a review?

alexcrichton · 2025-12-19T17:47:56Z

Oh no worries, I ended up not having the energy on planes but I was planning to get to this beginning of next week when things are quieter with other folks on vacation

arjunr2 · 2025-12-19T18:09:00Z

I'll move any additional development henceforth on a different branch then, if necessary. But for now, do you want me to also revert the new internal rr/crimp crate and move it back into wasmtime? Almost nothing has changed code wise within these files with the move since it mostly didn't rely on wasmtime internals anyway.

cfallin · 2025-12-19T18:11:28Z

Yes; that's a big change to the public API (even if it's not much of a change to code structure) and I think we'll want to have a discussion about the way we handle this.

arjunr2 · 2025-12-19T19:03:11Z

Ok, it's moved back into wasmtime, leaving this branch as stable for review and fixes

alexcrichton

Ok I feel like I've read over this enough to the point where I wouldn't claim an encyclopedic knowledge but I've got a good high-level idea of what's going on. Overall my impression is that we need to figure out how to split this up to land it piecmeal and incrementally in Wasmtime. To me there's just too much changing here in the guts of the runtime to be able to effectively review.

I've got a lot of questions/comments about various abstractions/etc added here and there. These range from things like "we should push harder on not storing WasmFuncOrigin" to " there's quite a lot of #[cfg] and I think it can be reduced" to "this API I think is too pub and wants to be pub(crate)" to "I don't think the new ValRaw API is safe" and things like that.

What I'd recommend personally is someting like this:

Merge this PR here in this fork, and continue the process of merge in main every so often here in the fork.
Incrementally peel off chunks of this repo as PRs-to-wasmtime.
After the PR on wasmtime merges, merge the new main branch of Wasmtime into this repo.
Repeat until the diff from this repo and Wasmtime is small enough to be one final PR

Chunks that land in Wasmtime are unlikely to be all that useful until the end, but given the complete picture in this repo it's easier to land untested/inactive chunks in Wasmtime, review them incrementally, and then have it all working in the end. This is what we did with component-model-async, for example, and it also helps with writing tests because each incremental PR can test what's possible at least, if not the full picture.

Chunks of this PR I can see landing incrementally would be:

Config knobs/validation
Generating a sha256 of the wasm file
Misc refactorings such as flat_types_storage_or_pointer and movement of FlatTypesStorage
Extending core functions with new parameters like RRWasmFuncType
Landing an rr_disabled.rs module, for example, which has all the hooks they're just all noops.

And my hope is that'd at least reduce this diff a fair amount to make it more managable and it'd be more clear how to land the rest at that point. Is this someting that you'd be up for splitting up and landing?

One point I also want to clarify is that inevitably during review there will be comments that may change some fundamental design decisions here which require refactors/rebases in this repository itself. Personally I feel that's the review process "working as intended" but I mostly want to clarify that I don't mean for the goal of this process to be to land exactly this PR as-is a file-at-a-time for example. Instead I want to use the process of incremental PRs to give time and space to review each PR and each design decision on the way, possibly tweaking/updating them. This'll be much easier with a working implementation to validate ideas against (and reject ideas against), too.

arjunr2 · 2025-12-23T22:31:06Z

Yeah, okay, that's understandable. One thing though is that tests might be difficult for most of these PRs until we land the last one. Perhaps that's okay though because we have the working version here.

cfallin · 2025-12-23T22:44:39Z

It's probably good practice to write independent unit tests for each part (as much as feasible) anyway -- I've found that to be useful when doing the debugger implementation (for example) too.

alexcrichton · 2025-12-24T18:33:12Z

Yeah I understand that the first PRs won't be able to have like an end-to-end test and some PRs may not be testable at all. That's ok though because the full implementation lives here and has tests running/passing so I'm not too worried about that. The rough idea is to test what you can in incremental PRs and defer the rest to later (or also file issues for tests we want to write to get filled out later)

alexcrichton and others added 30 commits August 6, 2025 13:55

Publish dev artifacts from CI

afd4369

Helps get some binaries to test with

Merge pull request #2 from alexcrichton/publish-dev-binaries

0c00fd8

Publish dev artifacts from CI

Test out the merge queue

c106978

Merge pull request #3 from alexcrichton/publish-dev-binaries

140afbf

Test out the merge queue

Squashed RR rebase changes commit with CI support

71291fe

Merge pull request #4 from bytecodealliance/rr-merge

e44086b

Move RR

Add a sink support in path for recording benchmarking

808d221

Add builtin record/replay events

d53de4d

Add support for no-return builtins

1e9f725

[FIX] Missing lower_store record for call host dynamic + Decouple rr …

78a9553

…hooks into a separate module

Factor out common events between core and component wasm

5d4940e

Switch validation of host functions to use type index

0f4d0b4

Support wasm function call/return for RR (component; typed)

997406d

Added component signature for wasm func call recording

4a42d21

Revamped public API for record; replay still broken

cb8e0fc

Fix: host func return rr placement for call_host_dynamic

ebd1b38

Support new recording API on CLI

d6adc43

Reorganize RR module with hooks

3ba9849

Temporary patch: Replay API in-store; separate into independent drive…

bc16bc7

…r in future

Added flat abi extraction for interface types

1606da1

Added flat type encoding/decoding for events; reformatted validation …

728769d

…events

Refactor event errors into trait object

ccf7d4e

Fix flat storage computation case limit; change debugs for builtin ev…

742436d

…ents

Optimize fast path for recording by avoiding flat type computation

a69d77e

Fix: Lowering logic rr for overflow of MAX_FLAT_PARAMS

1b94378

Fix: Bug in flat storage variant computation

b15ae6d

Initial factoring out of replay driver

952627a

Things left to complete: * Core wasm support * Improved error handling * Wasmtime CLI integration

Added temporary CLI support (without async)

c4645ab

Added support for recording all boundaries of Func and TypedFunc;…

ffe22ff

… completed core wasm rr

Clean up of ReplayError messages

99763bb

arjunr2 added 12 commits December 12, 2025 15:08

Fixed no_std support; disallow cm-async calls with rr; add CI tests for

568a83b

no_std

Fix typos

1fcfc40

Factored out core RR interface into separate crate; fix rr tests to j…

9a66fcf

…ust disable async with component-model-async

Fix lints

569b098

Fully encapsulate trace as RREvents for new record/replay embedders

a7bcb84

Remove force soft for no_std, works in release mode

fc79643

Fix new rr crate to be no_std

8bc4c34

Add necessary wasmtime environ types as re-exports through rr

a4987fe

Add rr as a crate member

498af08

Re-export component and module instance ids through rr

682a3e4

Rename rr crate to wasm-crimp

668d145

Fix: missed rename for tests

83fb223

Move crimp crate back into wasmtime

dfd1f87

alexcrichton reviewed Dec 23, 2025

View reviewed changes

Feature-complete RR #5

Are you sure you want to change the base?

Feature-complete RR #5

Uh oh!

Conversation

arjunr2 commented Nov 14, 2025

Notes

Structure

Observed overheads

Uh oh!

cfallin commented Dec 3, 2025

Uh oh!

arjunr2 commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cfallin commented Dec 4, 2025

Uh oh!

cfallin commented Dec 4, 2025

Uh oh!

arjunr2 commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cfallin commented Dec 18, 2025

Uh oh!

arjunr2 commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arjunr2 commented Dec 18, 2025

Uh oh!

cfallin commented Dec 19, 2025

Uh oh!

alexcrichton commented Dec 19, 2025

Uh oh!

arjunr2 commented Dec 19, 2025

Uh oh!

cfallin commented Dec 19, 2025

Uh oh!

arjunr2 commented Dec 19, 2025

Uh oh!

alexcrichton left a comment

Choose a reason for hiding this comment

Uh oh!

arjunr2 commented Dec 23, 2025

Uh oh!

cfallin commented Dec 23, 2025

Uh oh!

alexcrichton commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

arjunr2 commented Dec 4, 2025 •

edited

Loading

arjunr2 commented Dec 4, 2025 •

edited

Loading

arjunr2 commented Dec 18, 2025 •

edited

Loading