Use `async fn` internally within Wasmtime #11430

alexcrichton · 2025-08-13T18:24:41Z

This commit is an initial step towards resolving #11262 by having async
functions internally Wasmtime actually be async instead of requiring
the use of fibers. This is expected to have a number of benefits:

The Rust compiler can be used to verify a future is Send instead of
"please audit the whole codebase's stack-local variables".
Raw pointer workarounds during table/memory growth will no longer be
required since the arguments can properly be a split borrow to data in
the store (eventually leading to unblocking Core WebAssembly libcalls are not sound in Wasmtime #11178).
Less duplication inside of Wasmtime and clearer implementations
internally. For example GC bits prior to this PR has duplicated
sync/async entrypoints (sometimes a few layers deep) which eventually
bottomed out in *_maybe_async bits which were unsafe and require
fiber bits to be setup. All of that is now gone with the async
functions being the "source of truth" and sync functions just call
them.
Fibers are not required for operations such as a GC, growing memory,
etc.

The general idea is that the source of truth for the implementation of
Wasmtime internals are all async functions. These functions are
callable from synchronous functions in the API with a documented panic
condition about avoiding them when Config::async_support is disabled.
When async_support is disabled it's known internally there should
never be an .await point meaning that we can poll the future of the
async version and assert that it's ready.

This commit is not the full realization of plumbing async everywhere
internally in Wasmtime. Instead all this does is plumb the async-ness of
ResourceLimiterAsync and that's it, aka memory and table growth are
now properly async. It turns out though that these limiters are
extremely deep within Wasmtime and thus necessitated many changes to get
this all working. In the end this ended up covering some of the trickier
parts of dealing with async and propagating that throughout the runtime.

Most changes in this commit are intended to be straightforward, but a
summary is:

Many more functions are async and .await their internals.
Some instances of run-a-closure-and-catch-the-error are now replaced
with type-with-Drop as that's the equivalent in the async world.
Internal traits in Wasmtime are now #[async_trait] to be object
safe. This has a performance impact detailed more below.
vm::assert_ready is used in synchronous contexts to assert that the
async version is done immediately. This is intended to always be
accompanied with an assert about async_support nearby.
vm::one_poll is used test if an asynchronous computation is ready
yet and is used in a few locations where a synchronous public API says
it'll work in async_support mode but fails with an async resource limiter.
GC and other internals were simplified where async functions are now
the "guts" and sync functions are thin veneers over these async functions.
An example of new async functions are that lazy GC store allocation
and instance allocation are both async functions now.
In a small number of locations a conditional check of
store.async_support() is done. For example during GC if
async_support is enabled arbitrary yield points are injected. For
libcalls if it's enabled block_on is used or otherwise it's asserted
to complete synchronously.
Previously unsafe functions dealing requiring external fiber
handling are now all safe and async.
Libcalls have a block_on! helper macro which should be itself a
function-taking-async-closure but requires future Rust features to
make it a function.

A consequence of this refactoring is that instantiation is now slower
than before. For example from our instantiation.rs benchmark:

sequential/pooling/spidermonkey.wasm
                        time:   [2.6674 µs 2.6691 µs 2.6718 µs]
                        change: [+20.975% +21.039% +21.111%] (p = 0.00 < 0.05)
                        Performance has regressed.

Other benchmarks I've been looking at locally in instantiation.rs have
pretty wild swings from either a performance improvement in this PR of
10% to a regression of 20%. This benchmark in particular though, also
one of the more interesting ones, is consistently 20% slower with this
commit. Attempting to bottom out this performance difference it looks
like it's largely "just async state machines vs not" where nothing else
really jumps out in the profile to me. In terms of absolute numbers the
time-to-instantiate is still in the single-digit-microsecond range with
madvise being the dominant function.

This is done in preparation for the next commit where `async` is plumbed more pervasively throughout the internals of Wasmtime. In doing so it'll require `dyn VMStore: Send` which will only be possible where `T: Send` in `Store<T>`.

This commit is an initial step towards resolving bytecodealliance#11262 by having async functions internally Wasmtime actually be `async` instead of requiring the use of fibers. This is expected to have a number of benefits: * The Rust compiler can be used to verify a future is `Send` instead of "please audit the whole codebase's stack-local variables". * Raw pointer workarounds during table/memory growth will no longer be required since the arguments can properly be a split borrow to data in the store (eventually leading to unblocking bytecodealliance#11178). * Less duplication inside of Wasmtime and clearer implementations internally. For example GC bits prior to this PR has duplicated sync/async entrypoints (sometimes a few layers deep) which eventually bottomed out in `*_maybe_async` bits which were `unsafe` and require fiber bits to be setup. All of that is now gone with the `async` functions being the "source of truth" and sync functions just call them. * Fibers are not required for operations such as a GC, growing memory, etc. The general idea is that the source of truth for the implementation of Wasmtime internals are all `async` functions. These functions are callable from synchronous functions in the API with a documented panic condition about avoiding them when `Config::async_support` is disabled. When `async_support` is disabled it's known internally there should never be an `.await` point meaning that we can poll the future of the async version and assert that it's ready. This commit is not the full realization of plumbing `async` everywhere internally in Wasmtime. Instead all this does is plumb the async-ness of `ResourceLimiterAsync` and that's it, aka memory and table growth are now properly async. It turns out though that these limiters are extremely deep within Wasmtime and thus necessitated many changes to get this all working. In the end this ended up covering some of the trickier parts of dealing with async and propagating that throughout the runtime. Most changes in this commit are intended to be straightforward, but a summary is: * Many more functions are `async` and `.await` their internals. * Some instances of run-a-closure-and-catch-the-error are now replaced with type-with-`Drop` as that's the equivalent in the async world. * Internal traits in Wasmtime are now `#[async_trait]` to be object safe. This has a performance impact detailed more below. * `vm::assert_ready` is used in synchronous contexts to assert that the async version is done immediately. This is intended to always be accompanied with an assert about `async_support` nearby. * `vm::one_poll` is used test if an asynchronous computation is ready yet and is used in a few locations where a synchronous public API says it'll work in `async_support` mode but fails with an async resource limiter. * GC and other internals were simplified where `async` functions are now the "guts" and sync functions are thin veneers over these `async` functions. * An example of new async functions are that lazy GC store allocation and instance allocation are both async functions now. * In a small number of locations a conditional check of `store.async_support()` is done. For example during GC if `async_support` is enabled arbitrary yield points are injected. For libcalls if it's enabled `block_on` is used or otherwise it's asserted to complete synchronously. * Previously `unsafe` functions dealing requiring external fiber handling are now all safe and `async`. * Libcalls have a `block_on!` helper macro which should be itself a function-taking-async-closure but requires future Rust features to make it a function. A consequence of this refactoring is that instantiation is now slower than before. For example from our `instantiation.rs` benchmark: ``` sequential/pooling/spidermonkey.wasm time: [2.6674 µs 2.6691 µs 2.6718 µs] change: [+20.975% +21.039% +21.111%] (p = 0.00 < 0.05) Performance has regressed. ``` Other benchmarks I've been looking at locally in `instantiation.rs` have pretty wild swings from either a performance improvement in this PR of 10% to a regression of 20%. This benchmark in particular though, also one of the more interesting ones, is consistently 20% slower with this commit. Attempting to bottom out this performance difference it looks like it's largely "just async state machines vs not" where nothing else really jumps out in the profile to me. In terms of absolute numbers the time-to-instantiate is still in the single-digit-microsecond range with `madvise` being the dominant function. prtest:full

alexcrichton · 2025-08-13T18:25:11Z

I'm starting this as a draft for now while I sort out CI things, but I also want to have some discussion about this ideally before landing. I plan on bringing this up in tomorrow's Wasmtime meeting.

Adding bytecodealliance/wasmtime#11430 as a discussion point

tschneidereit · 2025-08-14T08:51:26Z

Attempting to bottom out this performance difference it looks
like it's largely "just async state machines vs not" where nothing else
really jumps out in the profile to me.

IIUC, that means the overhead is a fixed cost that should be stable across different module types, as opposed to somehow scaling with something about the module type itself? If so, that seems not ideal but okay to me, given that we're talking about 0.5us. Otherwise I'd like to understand the implications a bit more.

fitzgen · 2025-08-14T18:13:41Z

For posterity, in today's Wasmtime meeting, we discussed this PR and ways to claw back some of the perf regression. The primary option we discussed was using Cranelift to compile a state-initialization function, which we have on file as #2639

Upon further refactoring and thinking about bytecodealliance#11430 I've realized that we might be able to sidestep `T: Send` on the store entirely which would be quite the boon if it can be pulled off. The realization I had is that the main reason for this was `&mut dyn VMStore` on the stack, but that itself is actually a bug in Wasmtime (bytecodealliance#11178) and shouldn't be done. The functions which have this on the stack should actually ONLY have the resource limiter, if configured. This means that while the `ResourceLimiter{,Async}` traits need a `Send` supertrait that's relatively easy to add without much impact. My hunch is that plumbing this through to the end will enable all the benefits of bytecodealliance#11430 without requiring adding `T: Send` to the store. This commit starts out on this journey by making table growth a true `async fn`. A new internal type is added to represent a store's limiter which is plumbed to growth functions. This represents a hierarchy of borrows that look like: * `StoreInner<T>` * `StoreResourceLimiter<'_>` * `StoreOpaque` * `Pin<&mut Instance>` * `&mut vm::Table` This notably, safely, allows operating on `vm::Table` with a `StoreResourceLimiter` at the same time. This is exactly what's needed and prevents needing to have `&mut dyn VMStore`, the previous argument, on the stack. This refactoring cleans up `unsafe` blocks in table growth right now which manually uses raw pointers to work around the borrow checker. No more now! I'll note as well that this is just an incremental step. What I plan on doing next is handling other locations like memory growth, memory allocation, and table allocation. Each of those will require further refactorings to ensure that things like GC are correctly accounted for so they're going to be split into separate PRs. Functionally though this PR should have no impact other than a fiber is no longer required for `Table::grow_async`.

alexcrichton · 2025-08-15T21:48:01Z

Ok I've done some more performance profiling an analysis of this. After more thinking and more optimizing, I think I've got an idea for a design that is cheaper at runtime as well as doesn't require T: Send. It'll require preparatory refactorings though so I'm going to start things out in #11442 and we can go from there. I've got everything queued up in my head I think but it'll take some time to get it all into PRs. The other benefit of all of this is that it's going to resolve a number of issues related to unsafe code and unnecessary unsafe, e.g. #11442 handles an outstanding unsafe block in table.rs.

* Make table growth a true `async fn` Upon further refactoring and thinking about #11430 I've realized that we might be able to sidestep `T: Send` on the store entirely which would be quite the boon if it can be pulled off. The realization I had is that the main reason for this was `&mut dyn VMStore` on the stack, but that itself is actually a bug in Wasmtime (#11178) and shouldn't be done. The functions which have this on the stack should actually ONLY have the resource limiter, if configured. This means that while the `ResourceLimiter{,Async}` traits need a `Send` supertrait that's relatively easy to add without much impact. My hunch is that plumbing this through to the end will enable all the benefits of #11430 without requiring adding `T: Send` to the store. This commit starts out on this journey by making table growth a true `async fn`. A new internal type is added to represent a store's limiter which is plumbed to growth functions. This represents a hierarchy of borrows that look like: * `StoreInner<T>` * `StoreResourceLimiter<'_>` * `StoreOpaque` * `Pin<&mut Instance>` * `&mut vm::Table` This notably, safely, allows operating on `vm::Table` with a `StoreResourceLimiter` at the same time. This is exactly what's needed and prevents needing to have `&mut dyn VMStore`, the previous argument, on the stack. This refactoring cleans up `unsafe` blocks in table growth right now which manually uses raw pointers to work around the borrow checker. No more now! I'll note as well that this is just an incremental step. What I plan on doing next is handling other locations like memory growth, memory allocation, and table allocation. Each of those will require further refactorings to ensure that things like GC are correctly accounted for so they're going to be split into separate PRs. Functionally though this PR should have no impact other than a fiber is no longer required for `Table::grow_async`. * Remove #[cfg] gate

This commit is a step in preparation for bytecodealliance#11430, notably core instance allocation, or `StoreOpaque::allocate_instance` is now an `async fn`. This function does not actually use the `async`-ness just yet so it's a noop from that point of view, but this propagates outwards to enough locations that I wanted to split this off to make future changes more digestable. Notably some creation functions here such as making an `Instance`, `Table`, or `Memory` are refactored internally to use this new `async` function. Annotations of `assert_ready` or `one_poll` are used as appropriate as well. For reference this commit was benchmarked with our `instantiation.rs` benchmark in the pooling allocator and shows no changes relative to the original baseline from before-`async`-PRs.

This is an analog of bytecodealliance#11442 but for memories. This had a little more impact due to memories being hooked into GC operations. Further refactoring of GC operations to make them safer/more-async is deferred to a future PR and for now it's "no worse than before". This is another step towards bytecodealliance#11430 and enables removing a longstanding `unsafe` block in `runtime/memory.rs` which previously could not be removed. One semantic change from this is that growth of a shared memory no longer uses an async limiter. This is done to keep growth of a shared memory consistent with creation of a shared memory where no limits are applied. This is due to the cross-store nature of shared memories which means that we can't tie growth to any one particular store. This additionally fixes an issue where an rwlock write guard was otherwise held across a `.await` point which creates a non-`Send` future, closing a possible soundness hole in Wasmtime.

Forgotten from bytecodealliance#11459 and extracted from bytecodealliance#11430, uses an RAII guard instead of a closure to handle errors.

* Make memory growth an `async` function This is an analog of #11442 but for memories. This had a little more impact due to memories being hooked into GC operations. Further refactoring of GC operations to make them safer/more-async is deferred to a future PR and for now it's "no worse than before". This is another step towards #11430 and enables removing a longstanding `unsafe` block in `runtime/memory.rs` which previously could not be removed. One semantic change from this is that growth of a shared memory no longer uses an async limiter. This is done to keep growth of a shared memory consistent with creation of a shared memory where no limits are applied. This is due to the cross-store nature of shared memories which means that we can't tie growth to any one particular store. This additionally fixes an issue where an rwlock write guard was otherwise held across a `.await` point which creates a non-`Send` future, closing a possible soundness hole in Wasmtime. * Fix threads-disabled build * Review comments

Forgotten from #11459 and extracted from #11430, uses an RAII guard instead of a closure to handle errors.

This commit is extracted from bytecodealliance#11430 to accurately reflect how const-expr evaluation is an async operation due to GC pauses that may happen. The changes in this commit are: * Const-expr evaluation is, at its core, now an `async` function. * To leverage this new `async`-ness all internal operations are switched from `*_maybe_async` to `*_async` meaning all the `*_maybe_async` methods can be removed. * Some libcalls using `*_maybe_async` are switch to using `*_async` plus the `block_on!` utility to help jettison more `*_maybe_async` methods. * Instance initialization is now an `async` function. This is temporarily handled with `block_on` during instance initialization to avoid propagating the `async`-ness further upwards. This `block_on` will get deleted in future refactorings. * Const-expr evaluation has been refactored slightly to enable having a fast path in global initialization which skips an `await` point entirely, achieving performance-parity in benchmarks prior to this commit. This ended up fixing a niche issue with GC where if a wasm execution was suspended during `table.init`, for example, during a const-expr evaluation triggering a GC then if the wasm execution was cancelled it would panic the host. This panic was because the GC operation returned `Result` but it was `unwrap`'d as part of the const-expr evaluation which can fail not only to invalid-ness but also due to "computation is cancelled" traps.

alexcrichton · 2025-08-19T22:59:58Z

Further work/investigation on #11468 revealed an optimization opportunity I was not aware of, but makes sense in retrospect: in an async function if an .await point is dynamically not executed then the function will execute faster. This makes sense to me because it avoids updating a state machine and/or spilling locals and execution continues as "normal", so hot-path/fast-path optimizations need to model, statically, that .await isn't necessary.

With #11468 there's no performance regression currently. That's not the complete story but I'm growing confident we can land this PR without T: Send and without a performance regression. Basically we get to have our cake and eat it too.

This commit is a step in preparation for bytecodealliance#11430, notably core instance allocation, or `StoreOpaque::allocate_instance` is now an `async fn`. This function does not actually use the `async`-ness just yet so it's a noop from that point of view, but this propagates outwards to enough locations that I wanted to split this off to make future changes more digestable. Notably some creation functions here such as making an `Instance`, `Table`, or `Memory` are refactored internally to use this new `async` function. Annotations of `assert_ready` or `one_poll` are used as appropriate as well. For reference this commit was benchmarked with our `instantiation.rs` benchmark in the pooling allocator and shows no changes relative to the original baseline from before-`async`-PRs.

* Make const-expr evaluation `async` This commit is extracted from #11430 to accurately reflect how const-expr evaluation is an async operation due to GC pauses that may happen. The changes in this commit are: * Const-expr evaluation is, at its core, now an `async` function. * To leverage this new `async`-ness all internal operations are switched from `*_maybe_async` to `*_async` meaning all the `*_maybe_async` methods can be removed. * Some libcalls using `*_maybe_async` are switch to using `*_async` plus the `block_on!` utility to help jettison more `*_maybe_async` methods. * Instance initialization is now an `async` function. This is temporarily handled with `block_on` during instance initialization to avoid propagating the `async`-ness further upwards. This `block_on` will get deleted in future refactorings. * Const-expr evaluation has been refactored slightly to enable having a fast path in global initialization which skips an `await` point entirely, achieving performance-parity in benchmarks prior to this commit. This ended up fixing a niche issue with GC where if a wasm execution was suspended during `table.init`, for example, during a const-expr evaluation triggering a GC then if the wasm execution was cancelled it would panic the host. This panic was because the GC operation returned `Result` but it was `unwrap`'d as part of the const-expr evaluation which can fail not only to invalid-ness but also due to "computation is cancelled" traps. * Fix configured build * Undo rebase mistake

This commit is a step in preparation for bytecodealliance#11430, notably core instance allocation, or `StoreOpaque::allocate_instance` is now an `async fn`. This function does not actually use the `async`-ness just yet so it's a noop from that point of view, but this propagates outwards to enough locations that I wanted to split this off to make future changes more digestable. Notably some creation functions here such as making an `Instance`, `Table`, or `Memory` are refactored internally to use this new `async` function. Annotations of `assert_ready` or `one_poll` are used as appropriate as well. For reference this commit was benchmarked with our `instantiation.rs` benchmark in the pooling allocator and shows no changes relative to the original baseline from before-`async`-PRs.

This commit is extracted from bytecodealliance#11430 to accurately reflect how const-expr evaluation is an async operation due to GC pauses that may happen. The changes in this commit are: * Const-expr evaluation is, at its core, now an `async` function. * To leverage this new `async`-ness all internal operations are switched from `*_maybe_async` to `*_async` meaning all the `*_maybe_async` methods can be removed. * Some libcalls using `*_maybe_async` are switch to using `*_async` plus the `block_on!` utility to help jettison more `*_maybe_async` methods. * Instance initialization is now an `async` function. This is temporarily handled with `block_on` during instance initialization to avoid propagating the `async`-ness further upwards. This `block_on` will get deleted in future refactorings. * Const-expr evaluation has been refactored slightly to enable having a fast path in global initialization which skips an `await` point entirely, achieving performance-parity in benchmarks prior to this commit. This ended up fixing a niche issue with GC where if a wasm execution was suspended during `table.init`, for example, during a const-expr evaluation triggering a GC then if the wasm execution was cancelled it would panic the host. This panic was because the GC operation returned `Result` but it was `unwrap`'d as part of the const-expr evaluation which can fail not only to invalid-ness but also due to "computation is cancelled" traps.

This commit is a step in preparation for bytecodealliance#11430, notably core instance allocation, or `StoreOpaque::allocate_instance` is now an `async fn`. This function does not actually use the `async`-ness just yet so it's a noop from that point of view, but this propagates outwards to enough locations that I wanted to split this off to make future changes more digestable. Notably some creation functions here such as making an `Instance`, `Table`, or `Memory` are refactored internally to use this new `async` function. Annotations of `assert_ready` or `one_poll` are used as appropriate as well. For reference this commit was benchmarked with our `instantiation.rs` benchmark in the pooling allocator and shows no changes relative to the original baseline from before-`async`-PRs.

This commit is extracted from bytecodealliance#11430 to accurately reflect how const-expr evaluation is an async operation due to GC pauses that may happen. The changes in this commit are: * Const-expr evaluation is, at its core, now an `async` function. * To leverage this new `async`-ness all internal operations are switched from `*_maybe_async` to `*_async` meaning all the `*_maybe_async` methods can be removed. * Some libcalls using `*_maybe_async` are switch to using `*_async` plus the `block_on!` utility to help jettison more `*_maybe_async` methods. * Instance initialization is now an `async` function. This is temporarily handled with `block_on` during instance initialization to avoid propagating the `async`-ness further upwards. This `block_on` will get deleted in future refactorings. * Const-expr evaluation has been refactored slightly to enable having a fast path in global initialization which skips an `await` point entirely, achieving performance-parity in benchmarks prior to this commit. This ended up fixing a niche issue with GC where if a wasm execution was suspended during `table.init`, for example, during a const-expr evaluation triggering a GC then if the wasm execution was cancelled it would panic the host. This panic was because the GC operation returned `Result` but it was `unwrap`'d as part of the const-expr evaluation which can fail not only to invalid-ness but also due to "computation is cancelled" traps.

This commit is a step in preparation for bytecodealliance#11430, notably core instance allocation, or `StoreOpaque::allocate_instance` is now an `async fn`. This function does not actually use the `async`-ness just yet so it's a noop from that point of view, but this propagates outwards to enough locations that I wanted to split this off to make future changes more digestable. Notably some creation functions here such as making an `Instance`, `Table`, or `Memory` are refactored internally to use this new `async` function. Annotations of `assert_ready` or `one_poll` are used as appropriate as well. For reference this commit was benchmarked with our `instantiation.rs` benchmark in the pooling allocator and shows no changes relative to the original baseline from before-`async`-PRs.

* Make core instance allocation an `async` function This commit is a step in preparation for #11430, notably core instance allocation, or `StoreOpaque::allocate_instance` is now an `async fn`. This function does not actually use the `async`-ness just yet so it's a noop from that point of view, but this propagates outwards to enough locations that I wanted to split this off to make future changes more digestable. Notably some creation functions here such as making an `Instance`, `Table`, or `Memory` are refactored internally to use this new `async` function. Annotations of `assert_ready` or `one_poll` are used as appropriate as well. For reference this commit was benchmarked with our `instantiation.rs` benchmark in the pooling allocator and shows no changes relative to the original baseline from before-`async`-PRs. * Make table/memory creation `async` functions This commit is a large-ish refactor which is made possible by the many previous refactorings to internals w.r.t. async-in-Wasmtime. The end goal of this change is that table and memory allocation are both `async` functions. Achieving this, however, required some refactoring to enable it to work: * To work with `Send` neither function can close over `dyn VMStore`. This required changing their `Option<&mut dyn VMStore>` arugment to `Option<&mut StoreResourceLimiter<'_>>` * Somehow a `StoreResourceLimiter` needed to be acquired from an `InstanceAllocationRequest`. Previously the store was stored here as an unsafe raw pointer, but I've refactored this now so `InstanceAllocationRequest` directly stores `&StoreOpaque` and `Option<&mut StoreResourceLimiter>` meaning it's trivial to acquire them. This additionally means no more `unsafe` access of the store during instance allocation (yay!). * Now-redundant fields of `InstanceAllocationRequest` were removed since they can be safely inferred from `&StoreOpaque`. For example passing around `&Tunables` is now all gone. * Methods upwards from table/memory allocation to the `InstanceAllocator` trait needed to be made `async`. This includes new `#[async_trait]` methods for example. * `StoreOpaque::ensure_gc_store` is now an `async` function. This internally carries a new `unsafe` block carried over from before with the raw point passed around in `InstanceAllocationRequest`. A future PR will delete this `unsafe` block, it's just temporary. I attempted a few times to split this PR up into separate commits but everything is relatively intertwined here so this is the smallest "atomic" unit I could manage to land these changes and refactorings. * Shuffle `async-trait` dep * Fix configured build

alexcrichton · 2025-08-21T01:15:16Z

Ok through all the various PRs above this PR is now entirely obsolete. All the benefits of this are on main, yay!

There's a 5% performance regression on main relative to when I started this work which is due to #[async_trait] making boxed futures. Otherwise though I think it all worked out well!

tschneidereit · 2025-08-21T08:15:10Z

There's a 5% performance regression on main relative to when I started this work which is due to #[async_trait] making boxed futures.

Can you say more about what kinds of things regressed? Or is this just "everything is pretty uniformly 5% slower"?

And separately, is there anything we can do to claw this back? And if so, can we track that somewhere?

alexcrichton · 2025-08-21T19:23:34Z

Throughout this work I was watching the sequential/pooling/(spidermonkey|wasi).wasm benchmark defined in benches/instantiation.rs in this repo. I copied spidermonkey.wasm from Sightglass and otherwise this benchmark repeatedly instantiates in a loop these wasm modules. The 5% regression was time-to-instantiate-and-tear-down-the-store as measured by Criterion. Numbers were in the ~2us range for both modules and the 5% regression was on that number as well.

#11470 was the cause of this change and in profiling and analyzing that my conclusion was it's more-or-less entirely due to #[async_trait]. Previously where we had only dynamic dispatch we now have dynamic dispatch plus heap-allocated futures. The extra heap allocation was what was showing up in the profile primarily different from before. Effectively each table and memory being allocated now requires a heap-allocated future to track the state of progressing through the allocation there.

I don't really know of a great way to claw back this performance easily. One option is to way for dyn-compatible async traits in Rust, but that's likely to take awhile. Another option is to possibly have both an async and a sync trait method and we dynamically select which one depending on the resource limiter that's been configured. For the small wins here though I'd say that's probably not worth it, personally. Given the scale of the numbers here and the micro-benchmark nature I also wasn't planning on tracking this since we generally just try to get instantiation as fast as possible as opposed to "must be below this threshold at all times". In that sense it's a larger constant-factor than before, but that's naturally going to fluctuate over time IMO

tschneidereit · 2025-08-21T20:11:26Z

Thank you, that's very helpful. I was mildly concerned because I thought you were talking about everything being 5% slower. If it's just instantiation (and I now remember you mentioning this earlier), not e.g. execution throughput, then that's much less concerning. I think that all seems fine, then.

* Make table growth a true `async fn` Upon further refactoring and thinking about bytecodealliance#11430 I've realized that we might be able to sidestep `T: Send` on the store entirely which would be quite the boon if it can be pulled off. The realization I had is that the main reason for this was `&mut dyn VMStore` on the stack, but that itself is actually a bug in Wasmtime (bytecodealliance#11178) and shouldn't be done. The functions which have this on the stack should actually ONLY have the resource limiter, if configured. This means that while the `ResourceLimiter{,Async}` traits need a `Send` supertrait that's relatively easy to add without much impact. My hunch is that plumbing this through to the end will enable all the benefits of bytecodealliance#11430 without requiring adding `T: Send` to the store. This commit starts out on this journey by making table growth a true `async fn`. A new internal type is added to represent a store's limiter which is plumbed to growth functions. This represents a hierarchy of borrows that look like: * `StoreInner<T>` * `StoreResourceLimiter<'_>` * `StoreOpaque` * `Pin<&mut Instance>` * `&mut vm::Table` This notably, safely, allows operating on `vm::Table` with a `StoreResourceLimiter` at the same time. This is exactly what's needed and prevents needing to have `&mut dyn VMStore`, the previous argument, on the stack. This refactoring cleans up `unsafe` blocks in table growth right now which manually uses raw pointers to work around the borrow checker. No more now! I'll note as well that this is just an incremental step. What I plan on doing next is handling other locations like memory growth, memory allocation, and table allocation. Each of those will require further refactorings to ensure that things like GC are correctly accounted for so they're going to be split into separate PRs. Functionally though this PR should have no impact other than a fiber is no longer required for `Table::grow_async`. * Remove #[cfg] gate

* Make memory growth an `async` function This is an analog of bytecodealliance#11442 but for memories. This had a little more impact due to memories being hooked into GC operations. Further refactoring of GC operations to make them safer/more-async is deferred to a future PR and for now it's "no worse than before". This is another step towards bytecodealliance#11430 and enables removing a longstanding `unsafe` block in `runtime/memory.rs` which previously could not be removed. One semantic change from this is that growth of a shared memory no longer uses an async limiter. This is done to keep growth of a shared memory consistent with creation of a shared memory where no limits are applied. This is due to the cross-store nature of shared memories which means that we can't tie growth to any one particular store. This additionally fixes an issue where an rwlock write guard was otherwise held across a `.await` point which creates a non-`Send` future, closing a possible soundness hole in Wasmtime. * Fix threads-disabled build * Review comments

Forgotten from bytecodealliance#11459 and extracted from bytecodealliance#11430, uses an RAII guard instead of a closure to handle errors.

* Make const-expr evaluation `async` This commit is extracted from bytecodealliance#11430 to accurately reflect how const-expr evaluation is an async operation due to GC pauses that may happen. The changes in this commit are: * Const-expr evaluation is, at its core, now an `async` function. * To leverage this new `async`-ness all internal operations are switched from `*_maybe_async` to `*_async` meaning all the `*_maybe_async` methods can be removed. * Some libcalls using `*_maybe_async` are switch to using `*_async` plus the `block_on!` utility to help jettison more `*_maybe_async` methods. * Instance initialization is now an `async` function. This is temporarily handled with `block_on` during instance initialization to avoid propagating the `async`-ness further upwards. This `block_on` will get deleted in future refactorings. * Const-expr evaluation has been refactored slightly to enable having a fast path in global initialization which skips an `await` point entirely, achieving performance-parity in benchmarks prior to this commit. This ended up fixing a niche issue with GC where if a wasm execution was suspended during `table.init`, for example, during a const-expr evaluation triggering a GC then if the wasm execution was cancelled it would panic the host. This panic was because the GC operation returned `Result` but it was `unwrap`'d as part of the const-expr evaluation which can fail not only to invalid-ness but also due to "computation is cancelled" traps. * Fix configured build * Undo rebase mistake

* Make core instance allocation an `async` function This commit is a step in preparation for bytecodealliance#11430, notably core instance allocation, or `StoreOpaque::allocate_instance` is now an `async fn`. This function does not actually use the `async`-ness just yet so it's a noop from that point of view, but this propagates outwards to enough locations that I wanted to split this off to make future changes more digestable. Notably some creation functions here such as making an `Instance`, `Table`, or `Memory` are refactored internally to use this new `async` function. Annotations of `assert_ready` or `one_poll` are used as appropriate as well. For reference this commit was benchmarked with our `instantiation.rs` benchmark in the pooling allocator and shows no changes relative to the original baseline from before-`async`-PRs. * Make table/memory creation `async` functions This commit is a large-ish refactor which is made possible by the many previous refactorings to internals w.r.t. async-in-Wasmtime. The end goal of this change is that table and memory allocation are both `async` functions. Achieving this, however, required some refactoring to enable it to work: * To work with `Send` neither function can close over `dyn VMStore`. This required changing their `Option<&mut dyn VMStore>` arugment to `Option<&mut StoreResourceLimiter<'_>>` * Somehow a `StoreResourceLimiter` needed to be acquired from an `InstanceAllocationRequest`. Previously the store was stored here as an unsafe raw pointer, but I've refactored this now so `InstanceAllocationRequest` directly stores `&StoreOpaque` and `Option<&mut StoreResourceLimiter>` meaning it's trivial to acquire them. This additionally means no more `unsafe` access of the store during instance allocation (yay!). * Now-redundant fields of `InstanceAllocationRequest` were removed since they can be safely inferred from `&StoreOpaque`. For example passing around `&Tunables` is now all gone. * Methods upwards from table/memory allocation to the `InstanceAllocator` trait needed to be made `async`. This includes new `#[async_trait]` methods for example. * `StoreOpaque::ensure_gc_store` is now an `async` function. This internally carries a new `unsafe` block carried over from before with the raw point passed around in `InstanceAllocationRequest`. A future PR will delete this `unsafe` block, it's just temporary. I attempted a few times to split this PR up into separate commits but everything is relatively intertwined here so this is the smallest "atomic" unit I could manage to land these changes and refactorings. * Shuffle `async-trait` dep * Fix configured build

alexcrichton added 2 commits August 13, 2025 10:55

Add Send bound to T in Store<T>

abcaf86

This is done in preparation for the next commit where `async` is plumbed more pervasively throughout the internals of Wasmtime. In doing so it'll require `dyn VMStore: Send` which will only be possible where `T: Send` in `Store<T>`.

alexcrichton added a commit to bytecodealliance/meetings that referenced this pull request Aug 13, 2025

Update wasmtime-08-14.md

45eaddb

Adding bytecodealliance/wasmtime#11430 as a discussion point

alexcrichton mentioned this pull request Aug 13, 2025

Update wasmtime-08-14.md bytecodealliance/meetings#660

Merged

alexcrichton added 3 commits August 13, 2025 11:29

Fix gc-disabled function signature

fbbc77b

Update expanded test expectations

cdad5d8

Fix some CI issues

a24d411

alexcrichton mentioned this pull request Aug 15, 2025

Make table growth a true async fn #11442

Merged

alexcrichton mentioned this pull request Aug 18, 2025

Relax the Send bound on fiber creation #11454

Merged

alexcrichton mentioned this pull request Aug 19, 2025

Make core instance allocation an async function #11456

Closed

alexcrichton mentioned this pull request Aug 19, 2025

Make memory growth an async function #11460

Merged

alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 19, 2025

Use another RAII guard for instance allocation

b31af20

Forgotten from bytecodealliance#11459 and extracted from bytecodealliance#11430, uses an RAII guard instead of a closure to handle errors.

alexcrichton mentioned this pull request Aug 19, 2025

Use another RAII guard for instance allocation #11466

Merged

github-merge-queue bot pushed a commit that referenced this pull request Aug 19, 2025

Use another RAII guard for instance allocation (#11466)

94a8bb8

Forgotten from #11459 and extracted from #11430, uses an RAII guard instead of a closure to handle errors.

alexcrichton mentioned this pull request Aug 19, 2025

Make const-expr evaluation async #11468

Merged

alexcrichton mentioned this pull request Aug 19, 2025

Make table/memory creation async functions #11470

Merged

alexcrichton closed this Aug 21, 2025

alexcrichton deleted the internal-async-functions branch August 21, 2025 01:15

alexcrichton mentioned this pull request Aug 21, 2025

Internally within Wasmtime synchronous functions are often a lie #11262

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use `async fn` internally within Wasmtime #11430

Use `async fn` internally within Wasmtime #11430

Uh oh!

alexcrichton commented Aug 13, 2025

Uh oh!

alexcrichton commented Aug 13, 2025

Uh oh!

tschneidereit commented Aug 14, 2025

Uh oh!

fitzgen commented Aug 14, 2025

Uh oh!

alexcrichton commented Aug 15, 2025

Uh oh!

alexcrichton commented Aug 19, 2025

Uh oh!

alexcrichton commented Aug 21, 2025

Uh oh!

tschneidereit commented Aug 21, 2025

Uh oh!

alexcrichton commented Aug 21, 2025

Uh oh!

tschneidereit commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use async fn internally within Wasmtime #11430

Use async fn internally within Wasmtime #11430

Uh oh!

Conversation

alexcrichton commented Aug 13, 2025

Uh oh!

alexcrichton commented Aug 13, 2025

Uh oh!

tschneidereit commented Aug 14, 2025

Uh oh!

fitzgen commented Aug 14, 2025

Uh oh!

alexcrichton commented Aug 15, 2025

Uh oh!

alexcrichton commented Aug 19, 2025

Uh oh!

alexcrichton commented Aug 21, 2025

Uh oh!

tschneidereit commented Aug 21, 2025

Uh oh!

alexcrichton commented Aug 21, 2025

Uh oh!

tschneidereit commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use `async fn` internally within Wasmtime #11430

Use `async fn` internally within Wasmtime #11430