Skip to content

Conversation

@alexcrichton
Copy link
Member

This commit is an initial step towards resolving #11262 by having async
functions internally Wasmtime actually be async instead of requiring
the use of fibers. This is expected to have a number of benefits:

  • The Rust compiler can be used to verify a future is Send instead of
    "please audit the whole codebase's stack-local variables".

  • Raw pointer workarounds during table/memory growth will no longer be
    required since the arguments can properly be a split borrow to data in
    the store (eventually leading to unblocking Core WebAssembly libcalls are not sound in Wasmtime #11178).

  • Less duplication inside of Wasmtime and clearer implementations
    internally. For example GC bits prior to this PR has duplicated
    sync/async entrypoints (sometimes a few layers deep) which eventually
    bottomed out in *_maybe_async bits which were unsafe and require
    fiber bits to be setup. All of that is now gone with the async
    functions being the "source of truth" and sync functions just call
    them.

  • Fibers are not required for operations such as a GC, growing memory,
    etc.

The general idea is that the source of truth for the implementation of
Wasmtime internals are all async functions. These functions are
callable from synchronous functions in the API with a documented panic
condition about avoiding them when Config::async_support is disabled.
When async_support is disabled it's known internally there should
never be an .await point meaning that we can poll the future of the
async version and assert that it's ready.

This commit is not the full realization of plumbing async everywhere
internally in Wasmtime. Instead all this does is plumb the async-ness of
ResourceLimiterAsync and that's it, aka memory and table growth are
now properly async. It turns out though that these limiters are
extremely deep within Wasmtime and thus necessitated many changes to get
this all working. In the end this ended up covering some of the trickier
parts of dealing with async and propagating that throughout the runtime.

Most changes in this commit are intended to be straightforward, but a
summary is:

  • Many more functions are async and .await their internals.

  • Some instances of run-a-closure-and-catch-the-error are now replaced
    with type-with-Drop as that's the equivalent in the async world.

  • Internal traits in Wasmtime are now #[async_trait] to be object
    safe. This has a performance impact detailed more below.

  • vm::assert_ready is used in synchronous contexts to assert that the
    async version is done immediately. This is intended to always be
    accompanied with an assert about async_support nearby.

  • vm::one_poll is used test if an asynchronous computation is ready
    yet and is used in a few locations where a synchronous public API says
    it'll work in async_support mode but fails with an async resource limiter.

  • GC and other internals were simplified where async functions are now
    the "guts" and sync functions are thin veneers over these async functions.

  • An example of new async functions are that lazy GC store allocation
    and instance allocation are both async functions now.

  • In a small number of locations a conditional check of
    store.async_support() is done. For example during GC if
    async_support is enabled arbitrary yield points are injected. For
    libcalls if it's enabled block_on is used or otherwise it's asserted
    to complete synchronously.

  • Previously unsafe functions dealing requiring external fiber
    handling are now all safe and async.

  • Libcalls have a block_on! helper macro which should be itself a
    function-taking-async-closure but requires future Rust features to
    make it a function.

A consequence of this refactoring is that instantiation is now slower
than before. For example from our instantiation.rs benchmark:

sequential/pooling/spidermonkey.wasm
                        time:   [2.6674 µs 2.6691 µs 2.6718 µs]
                        change: [+20.975% +21.039% +21.111%] (p = 0.00 < 0.05)
                        Performance has regressed.

Other benchmarks I've been looking at locally in instantiation.rs have
pretty wild swings from either a performance improvement in this PR of
10% to a regression of 20%. This benchmark in particular though, also
one of the more interesting ones, is consistently 20% slower with this
commit. Attempting to bottom out this performance difference it looks
like it's largely "just async state machines vs not" where nothing else
really jumps out in the profile to me. In terms of absolute numbers the
time-to-instantiate is still in the single-digit-microsecond range with
madvise being the dominant function.

This is done in preparation for the next commit where `async` is plumbed
more pervasively throughout the internals of Wasmtime. In doing so it'll
require `dyn VMStore: Send` which will only be possible where `T: Send`
in `Store<T>`.
This commit is an initial step towards resolving bytecodealliance#11262 by having async
functions internally Wasmtime actually be `async` instead of requiring
the use of fibers. This is expected to have a number of benefits:

* The Rust compiler can be used to verify a future is `Send` instead of
  "please audit the whole codebase's stack-local variables".

* Raw pointer workarounds during table/memory growth will no longer be
  required since the arguments can properly be a split borrow to data in
  the store (eventually leading to unblocking bytecodealliance#11178).

* Less duplication inside of Wasmtime and clearer implementations
  internally. For example GC bits prior to this PR has duplicated
  sync/async entrypoints (sometimes a few layers deep) which eventually
  bottomed out in `*_maybe_async` bits which were `unsafe` and require
  fiber bits to be setup. All of that is now gone with the `async`
  functions being the "source of truth" and sync functions just call
  them.

* Fibers are not required for operations such as a GC, growing memory,
  etc.

The general idea is that the source of truth for the implementation of
Wasmtime internals are all `async` functions. These functions are
callable from synchronous functions in the API with a documented panic
condition about avoiding them when `Config::async_support` is disabled.
When `async_support` is disabled it's known internally there should
never be an `.await` point meaning that we can poll the future of the
async version and assert that it's ready.

This commit is not the full realization of plumbing `async` everywhere
internally in Wasmtime. Instead all this does is plumb the async-ness of
`ResourceLimiterAsync` and that's it, aka memory and table growth are
now properly async. It turns out though that these limiters are
extremely deep within Wasmtime and thus necessitated many changes to get
this all working. In the end this ended up covering some of the trickier
parts of dealing with async and propagating that throughout the runtime.

Most changes in this commit are intended to be straightforward, but a
summary is:

* Many more functions are `async` and `.await` their internals.

* Some instances of run-a-closure-and-catch-the-error are now replaced
  with type-with-`Drop` as that's the equivalent in the async world.

* Internal traits in Wasmtime are now `#[async_trait]` to be object
  safe. This has a performance impact detailed more below.

* `vm::assert_ready` is used in synchronous contexts to assert that the
  async version is done immediately. This is intended to always be
  accompanied with an assert about `async_support` nearby.

* `vm::one_poll` is used test if an asynchronous computation is ready
  yet and is used in a few locations where a synchronous public API says
  it'll work in `async_support` mode but fails with an async resource limiter.

* GC and other internals were simplified where `async` functions are now
  the "guts" and sync functions are thin veneers over these `async` functions.

* An example of new async functions are that lazy GC store allocation
  and instance allocation are both async functions now.

* In a small number of locations a conditional check of
  `store.async_support()` is done. For example during GC if
  `async_support` is enabled arbitrary yield points are injected. For
  libcalls if it's enabled `block_on` is used or otherwise it's asserted
  to complete synchronously.

* Previously `unsafe` functions dealing requiring external fiber
  handling are now all safe and `async`.

* Libcalls have a `block_on!` helper macro which should be itself a
  function-taking-async-closure but requires future Rust features to
  make it a function.

A consequence of this refactoring is that instantiation is now slower
than before. For example from our `instantiation.rs` benchmark:

```
sequential/pooling/spidermonkey.wasm
                        time:   [2.6674 µs 2.6691 µs 2.6718 µs]
                        change: [+20.975% +21.039% +21.111%] (p = 0.00 < 0.05)
                        Performance has regressed.
```

Other benchmarks I've been looking at locally in `instantiation.rs` have
pretty wild swings from either a performance improvement in this PR of
10% to a regression of 20%. This benchmark in particular though, also
one of the more interesting ones, is consistently 20% slower with this
commit. Attempting to bottom out this performance difference it looks
like it's largely "just async state machines vs not" where nothing else
really jumps out in the profile to me. In terms of absolute numbers the
time-to-instantiate is still in the single-digit-microsecond range with
`madvise` being the dominant function.

prtest:full
@alexcrichton
Copy link
Member Author

I'm starting this as a draft for now while I sort out CI things, but I also want to have some discussion about this ideally before landing. I plan on bringing this up in tomorrow's Wasmtime meeting.

alexcrichton added a commit to bytecodealliance/meetings that referenced this pull request Aug 13, 2025
@tschneidereit
Copy link
Member

Attempting to bottom out this performance difference it looks
like it's largely "just async state machines vs not" where nothing else
really jumps out in the profile to me.

IIUC, that means the overhead is a fixed cost that should be stable across different module types, as opposed to somehow scaling with something about the module type itself? If so, that seems not ideal but okay to me, given that we're talking about 0.5us. Otherwise I'd like to understand the implications a bit more.

@fitzgen
Copy link
Member

fitzgen commented Aug 14, 2025

For posterity, in today's Wasmtime meeting, we discussed this PR and ways to claw back some of the perf regression. The primary option we discussed was using Cranelift to compile a state-initialization function, which we have on file as #2639

alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 15, 2025
Upon further refactoring and thinking about bytecodealliance#11430 I've realized that we
might be able to sidestep `T: Send` on the store entirely which would be
quite the boon if it can be pulled off. The realization I had is that
the main reason for this was `&mut dyn VMStore` on the stack, but that
itself is actually a bug in Wasmtime (bytecodealliance#11178) and shouldn't be done.
The functions which have this on the stack should actually ONLY have the
resource limiter, if configured. This means that while the
`ResourceLimiter{,Async}` traits need a `Send` supertrait that's
relatively easy to add without much impact. My hunch is that plumbing
this through to the end will enable all the benefits of bytecodealliance#11430 without
requiring adding `T: Send` to the store.

This commit starts out on this journey by making table growth a true
`async fn`. A new internal type is added to represent a store's limiter
which is plumbed to growth functions. This represents a hierarchy of
borrows that look like:

* `StoreInner<T>`
  * `StoreResourceLimiter<'_>`
  * `StoreOpaque`
    * `Pin<&mut Instance>`
      * `&mut vm::Table`

This notably, safely, allows operating on `vm::Table` with a
`StoreResourceLimiter` at the same time. This is exactly what's needed
and prevents needing to have `&mut dyn VMStore`, the previous argument,
on the stack.

This refactoring cleans up `unsafe` blocks in table growth right
now which manually uses raw pointers to work around the borrow checker.
No more now!

I'll note as well that this is just an incremental step. What I plan on
doing next is handling other locations like memory growth, memory
allocation, and table allocation. Each of those will require further
refactorings to ensure that things like GC are correctly accounted for
so they're going to be split into separate PRs. Functionally though this
PR should have no impact other than a fiber is no longer required for
`Table::grow_async`.
@alexcrichton
Copy link
Member Author

Ok I've done some more performance profiling an analysis of this. After more thinking and more optimizing, I think I've got an idea for a design that is cheaper at runtime as well as doesn't require T: Send. It'll require preparatory refactorings though so I'm going to start things out in #11442 and we can go from there. I've got everything queued up in my head I think but it'll take some time to get it all into PRs. The other benefit of all of this is that it's going to resolve a number of issues related to unsafe code and unnecessary unsafe, e.g. #11442 handles an outstanding unsafe block in table.rs.

github-merge-queue bot pushed a commit that referenced this pull request Aug 18, 2025
* Make table growth a true `async fn`

Upon further refactoring and thinking about #11430 I've realized that we
might be able to sidestep `T: Send` on the store entirely which would be
quite the boon if it can be pulled off. The realization I had is that
the main reason for this was `&mut dyn VMStore` on the stack, but that
itself is actually a bug in Wasmtime (#11178) and shouldn't be done.
The functions which have this on the stack should actually ONLY have the
resource limiter, if configured. This means that while the
`ResourceLimiter{,Async}` traits need a `Send` supertrait that's
relatively easy to add without much impact. My hunch is that plumbing
this through to the end will enable all the benefits of #11430 without
requiring adding `T: Send` to the store.

This commit starts out on this journey by making table growth a true
`async fn`. A new internal type is added to represent a store's limiter
which is plumbed to growth functions. This represents a hierarchy of
borrows that look like:

* `StoreInner<T>`
  * `StoreResourceLimiter<'_>`
  * `StoreOpaque`
    * `Pin<&mut Instance>`
      * `&mut vm::Table`

This notably, safely, allows operating on `vm::Table` with a
`StoreResourceLimiter` at the same time. This is exactly what's needed
and prevents needing to have `&mut dyn VMStore`, the previous argument,
on the stack.

This refactoring cleans up `unsafe` blocks in table growth right
now which manually uses raw pointers to work around the borrow checker.
No more now!

I'll note as well that this is just an incremental step. What I plan on
doing next is handling other locations like memory growth, memory
allocation, and table allocation. Each of those will require further
refactorings to ensure that things like GC are correctly accounted for
so they're going to be split into separate PRs. Functionally though this
PR should have no impact other than a fiber is no longer required for
`Table::grow_async`.

* Remove #[cfg] gate
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 18, 2025
This commit is a step in preparation for bytecodealliance#11430, notably core instance
allocation, or `StoreOpaque::allocate_instance` is now an `async fn`.
This function does not actually use the `async`-ness just yet so it's a
noop from that point of view, but this propagates outwards to enough
locations that I wanted to split this off to make future changes more
digestable.

Notably some creation functions here such as making an `Instance`,
`Table`, or `Memory` are refactored internally to use this new `async`
function. Annotations of `assert_ready` or `one_poll` are used as
appropriate as well.

For reference this commit was benchmarked with our `instantiation.rs`
benchmark in the pooling allocator and shows no changes relative to the
original baseline from before-`async`-PRs.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 18, 2025
This commit is a step in preparation for bytecodealliance#11430, notably core instance
allocation, or `StoreOpaque::allocate_instance` is now an `async fn`.
This function does not actually use the `async`-ness just yet so it's a
noop from that point of view, but this propagates outwards to enough
locations that I wanted to split this off to make future changes more
digestable.

Notably some creation functions here such as making an `Instance`,
`Table`, or `Memory` are refactored internally to use this new `async`
function. Annotations of `assert_ready` or `one_poll` are used as
appropriate as well.

For reference this commit was benchmarked with our `instantiation.rs`
benchmark in the pooling allocator and shows no changes relative to the
original baseline from before-`async`-PRs.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 19, 2025
This commit is a step in preparation for bytecodealliance#11430, notably core instance
allocation, or `StoreOpaque::allocate_instance` is now an `async fn`.
This function does not actually use the `async`-ness just yet so it's a
noop from that point of view, but this propagates outwards to enough
locations that I wanted to split this off to make future changes more
digestable.

Notably some creation functions here such as making an `Instance`,
`Table`, or `Memory` are refactored internally to use this new `async`
function. Annotations of `assert_ready` or `one_poll` are used as
appropriate as well.

For reference this commit was benchmarked with our `instantiation.rs`
benchmark in the pooling allocator and shows no changes relative to the
original baseline from before-`async`-PRs.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 19, 2025
This is an analog of bytecodealliance#11442 but for memories. This had a little more
impact due to memories being hooked into GC operations. Further
refactoring of GC operations to make them safer/more-async is deferred
to a future PR and for now it's "no worse than before". This is another
step towards bytecodealliance#11430 and enables removing a longstanding `unsafe` block
in `runtime/memory.rs` which previously could not be removed.

One semantic change from this is that growth of a shared memory no
longer uses an async limiter. This is done to keep growth of a shared
memory consistent with creation of a shared memory where no limits are
applied. This is due to the cross-store nature of shared memories which
means that we can't tie growth to any one particular store. This
additionally fixes an issue where an rwlock write guard was otherwise
held across a `.await` point which creates a non-`Send` future, closing
a possible soundness hole in Wasmtime.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 19, 2025
This is an analog of bytecodealliance#11442 but for memories. This had a little more
impact due to memories being hooked into GC operations. Further
refactoring of GC operations to make them safer/more-async is deferred
to a future PR and for now it's "no worse than before". This is another
step towards bytecodealliance#11430 and enables removing a longstanding `unsafe` block
in `runtime/memory.rs` which previously could not be removed.

One semantic change from this is that growth of a shared memory no
longer uses an async limiter. This is done to keep growth of a shared
memory consistent with creation of a shared memory where no limits are
applied. This is due to the cross-store nature of shared memories which
means that we can't tie growth to any one particular store. This
additionally fixes an issue where an rwlock write guard was otherwise
held across a `.await` point which creates a non-`Send` future, closing
a possible soundness hole in Wasmtime.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 19, 2025
This is an analog of bytecodealliance#11442 but for memories. This had a little more
impact due to memories being hooked into GC operations. Further
refactoring of GC operations to make them safer/more-async is deferred
to a future PR and for now it's "no worse than before". This is another
step towards bytecodealliance#11430 and enables removing a longstanding `unsafe` block
in `runtime/memory.rs` which previously could not be removed.

One semantic change from this is that growth of a shared memory no
longer uses an async limiter. This is done to keep growth of a shared
memory consistent with creation of a shared memory where no limits are
applied. This is due to the cross-store nature of shared memories which
means that we can't tie growth to any one particular store. This
additionally fixes an issue where an rwlock write guard was otherwise
held across a `.await` point which creates a non-`Send` future, closing
a possible soundness hole in Wasmtime.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 19, 2025
Forgotten from bytecodealliance#11459 and extracted from bytecodealliance#11430, uses an RAII guard
instead of a closure to handle errors.
github-merge-queue bot pushed a commit that referenced this pull request Aug 19, 2025
* Make memory growth an `async` function

This is an analog of #11442 but for memories. This had a little more
impact due to memories being hooked into GC operations. Further
refactoring of GC operations to make them safer/more-async is deferred
to a future PR and for now it's "no worse than before". This is another
step towards #11430 and enables removing a longstanding `unsafe` block
in `runtime/memory.rs` which previously could not be removed.

One semantic change from this is that growth of a shared memory no
longer uses an async limiter. This is done to keep growth of a shared
memory consistent with creation of a shared memory where no limits are
applied. This is due to the cross-store nature of shared memories which
means that we can't tie growth to any one particular store. This
additionally fixes an issue where an rwlock write guard was otherwise
held across a `.await` point which creates a non-`Send` future, closing
a possible soundness hole in Wasmtime.

* Fix threads-disabled build

* Review comments
github-merge-queue bot pushed a commit that referenced this pull request Aug 19, 2025
Forgotten from #11459 and extracted from #11430, uses an RAII guard
instead of a closure to handle errors.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 19, 2025
This commit is extracted from bytecodealliance#11430 to accurately reflect how
const-expr evaluation is an async operation due to GC pauses that may
happen. The changes in this commit are:

* Const-expr evaluation is, at its core, now an `async` function.
* To leverage this new `async`-ness all internal operations are switched
  from `*_maybe_async` to `*_async` meaning all the `*_maybe_async`
  methods can be removed.
* Some libcalls using `*_maybe_async` are switch to using `*_async` plus
  the `block_on!` utility to help jettison more `*_maybe_async` methods.
* Instance initialization is now an `async` function. This is
  temporarily handled with `block_on` during instance initialization to
  avoid propagating the `async`-ness further upwards. This `block_on`
  will get deleted in future refactorings.
* Const-expr evaluation has been refactored slightly to enable having a
  fast path in global initialization which skips an `await` point
  entirely, achieving performance-parity in benchmarks prior to this commit.

This ended up fixing a niche issue with GC where if a wasm execution was
suspended during `table.init`, for example, during a const-expr
evaluation triggering a GC then if the wasm execution was cancelled it
would panic the host. This panic was because the GC operation returned
`Result` but it was `unwrap`'d as part of the const-expr evaluation
which can fail not only to invalid-ness but also due to "computation is
cancelled" traps.
@alexcrichton
Copy link
Member Author

Further work/investigation on #11468 revealed an optimization opportunity I was not aware of, but makes sense in retrospect: in an async function if an .await point is dynamically not executed then the function will execute faster. This makes sense to me because it avoids updating a state machine and/or spilling locals and execution continues as "normal", so hot-path/fast-path optimizations need to model, statically, that .await isn't necessary.

With #11468 there's no performance regression currently. That's not the complete story but I'm growing confident we can land this PR without T: Send and without a performance regression. Basically we get to have our cake and eat it too.

alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 19, 2025
This commit is a step in preparation for bytecodealliance#11430, notably core instance
allocation, or `StoreOpaque::allocate_instance` is now an `async fn`.
This function does not actually use the `async`-ness just yet so it's a
noop from that point of view, but this propagates outwards to enough
locations that I wanted to split this off to make future changes more
digestable.

Notably some creation functions here such as making an `Instance`,
`Table`, or `Memory` are refactored internally to use this new `async`
function. Annotations of `assert_ready` or `one_poll` are used as
appropriate as well.

For reference this commit was benchmarked with our `instantiation.rs`
benchmark in the pooling allocator and shows no changes relative to the
original baseline from before-`async`-PRs.
github-merge-queue bot pushed a commit that referenced this pull request Aug 20, 2025
* Make const-expr evaluation `async`

This commit is extracted from #11430 to accurately reflect how
const-expr evaluation is an async operation due to GC pauses that may
happen. The changes in this commit are:

* Const-expr evaluation is, at its core, now an `async` function.
* To leverage this new `async`-ness all internal operations are switched
  from `*_maybe_async` to `*_async` meaning all the `*_maybe_async`
  methods can be removed.
* Some libcalls using `*_maybe_async` are switch to using `*_async` plus
  the `block_on!` utility to help jettison more `*_maybe_async` methods.
* Instance initialization is now an `async` function. This is
  temporarily handled with `block_on` during instance initialization to
  avoid propagating the `async`-ness further upwards. This `block_on`
  will get deleted in future refactorings.
* Const-expr evaluation has been refactored slightly to enable having a
  fast path in global initialization which skips an `await` point
  entirely, achieving performance-parity in benchmarks prior to this commit.

This ended up fixing a niche issue with GC where if a wasm execution was
suspended during `table.init`, for example, during a const-expr
evaluation triggering a GC then if the wasm execution was cancelled it
would panic the host. This panic was because the GC operation returned
`Result` but it was `unwrap`'d as part of the const-expr evaluation
which can fail not only to invalid-ness but also due to "computation is
cancelled" traps.

* Fix configured build

* Undo rebase mistake
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 20, 2025
This commit is a step in preparation for bytecodealliance#11430, notably core instance
allocation, or `StoreOpaque::allocate_instance` is now an `async fn`.
This function does not actually use the `async`-ness just yet so it's a
noop from that point of view, but this propagates outwards to enough
locations that I wanted to split this off to make future changes more
digestable.

Notably some creation functions here such as making an `Instance`,
`Table`, or `Memory` are refactored internally to use this new `async`
function. Annotations of `assert_ready` or `one_poll` are used as
appropriate as well.

For reference this commit was benchmarked with our `instantiation.rs`
benchmark in the pooling allocator and shows no changes relative to the
original baseline from before-`async`-PRs.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 20, 2025
This commit is extracted from bytecodealliance#11430 to accurately reflect how
const-expr evaluation is an async operation due to GC pauses that may
happen. The changes in this commit are:

* Const-expr evaluation is, at its core, now an `async` function.
* To leverage this new `async`-ness all internal operations are switched
  from `*_maybe_async` to `*_async` meaning all the `*_maybe_async`
  methods can be removed.
* Some libcalls using `*_maybe_async` are switch to using `*_async` plus
  the `block_on!` utility to help jettison more `*_maybe_async` methods.
* Instance initialization is now an `async` function. This is
  temporarily handled with `block_on` during instance initialization to
  avoid propagating the `async`-ness further upwards. This `block_on`
  will get deleted in future refactorings.
* Const-expr evaluation has been refactored slightly to enable having a
  fast path in global initialization which skips an `await` point
  entirely, achieving performance-parity in benchmarks prior to this commit.

This ended up fixing a niche issue with GC where if a wasm execution was
suspended during `table.init`, for example, during a const-expr
evaluation triggering a GC then if the wasm execution was cancelled it
would panic the host. This panic was because the GC operation returned
`Result` but it was `unwrap`'d as part of the const-expr evaluation
which can fail not only to invalid-ness but also due to "computation is
cancelled" traps.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 20, 2025
This commit is a step in preparation for bytecodealliance#11430, notably core instance
allocation, or `StoreOpaque::allocate_instance` is now an `async fn`.
This function does not actually use the `async`-ness just yet so it's a
noop from that point of view, but this propagates outwards to enough
locations that I wanted to split this off to make future changes more
digestable.

Notably some creation functions here such as making an `Instance`,
`Table`, or `Memory` are refactored internally to use this new `async`
function. Annotations of `assert_ready` or `one_poll` are used as
appropriate as well.

For reference this commit was benchmarked with our `instantiation.rs`
benchmark in the pooling allocator and shows no changes relative to the
original baseline from before-`async`-PRs.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 20, 2025
This commit is extracted from bytecodealliance#11430 to accurately reflect how
const-expr evaluation is an async operation due to GC pauses that may
happen. The changes in this commit are:

* Const-expr evaluation is, at its core, now an `async` function.
* To leverage this new `async`-ness all internal operations are switched
  from `*_maybe_async` to `*_async` meaning all the `*_maybe_async`
  methods can be removed.
* Some libcalls using `*_maybe_async` are switch to using `*_async` plus
  the `block_on!` utility to help jettison more `*_maybe_async` methods.
* Instance initialization is now an `async` function. This is
  temporarily handled with `block_on` during instance initialization to
  avoid propagating the `async`-ness further upwards. This `block_on`
  will get deleted in future refactorings.
* Const-expr evaluation has been refactored slightly to enable having a
  fast path in global initialization which skips an `await` point
  entirely, achieving performance-parity in benchmarks prior to this commit.

This ended up fixing a niche issue with GC where if a wasm execution was
suspended during `table.init`, for example, during a const-expr
evaluation triggering a GC then if the wasm execution was cancelled it
would panic the host. This panic was because the GC operation returned
`Result` but it was `unwrap`'d as part of the const-expr evaluation
which can fail not only to invalid-ness but also due to "computation is
cancelled" traps.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 20, 2025
This commit is a step in preparation for bytecodealliance#11430, notably core instance
allocation, or `StoreOpaque::allocate_instance` is now an `async fn`.
This function does not actually use the `async`-ness just yet so it's a
noop from that point of view, but this propagates outwards to enough
locations that I wanted to split this off to make future changes more
digestable.

Notably some creation functions here such as making an `Instance`,
`Table`, or `Memory` are refactored internally to use this new `async`
function. Annotations of `assert_ready` or `one_poll` are used as
appropriate as well.

For reference this commit was benchmarked with our `instantiation.rs`
benchmark in the pooling allocator and shows no changes relative to the
original baseline from before-`async`-PRs.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 20, 2025
This commit is a step in preparation for bytecodealliance#11430, notably core instance
allocation, or `StoreOpaque::allocate_instance` is now an `async fn`.
This function does not actually use the `async`-ness just yet so it's a
noop from that point of view, but this propagates outwards to enough
locations that I wanted to split this off to make future changes more
digestable.

Notably some creation functions here such as making an `Instance`,
`Table`, or `Memory` are refactored internally to use this new `async`
function. Annotations of `assert_ready` or `one_poll` are used as
appropriate as well.

For reference this commit was benchmarked with our `instantiation.rs`
benchmark in the pooling allocator and shows no changes relative to the
original baseline from before-`async`-PRs.
github-merge-queue bot pushed a commit that referenced this pull request Aug 21, 2025
* Make core instance allocation an `async` function

This commit is a step in preparation for #11430, notably core instance
allocation, or `StoreOpaque::allocate_instance` is now an `async fn`.
This function does not actually use the `async`-ness just yet so it's a
noop from that point of view, but this propagates outwards to enough
locations that I wanted to split this off to make future changes more
digestable.

Notably some creation functions here such as making an `Instance`,
`Table`, or `Memory` are refactored internally to use this new `async`
function. Annotations of `assert_ready` or `one_poll` are used as
appropriate as well.

For reference this commit was benchmarked with our `instantiation.rs`
benchmark in the pooling allocator and shows no changes relative to the
original baseline from before-`async`-PRs.

* Make table/memory creation `async` functions

This commit is a large-ish refactor which is made possible by the many
previous refactorings to internals w.r.t. async-in-Wasmtime. The end
goal of this change is that table and memory allocation are both `async`
functions. Achieving this, however, required some refactoring to enable
it to work:

* To work with `Send` neither function can close over `dyn VMStore`.
  This required changing their `Option<&mut dyn VMStore>` arugment to
  `Option<&mut StoreResourceLimiter<'_>>`
* Somehow a `StoreResourceLimiter` needed to be acquired from an
  `InstanceAllocationRequest`. Previously the store was stored here as
  an unsafe raw pointer, but I've refactored this now so
  `InstanceAllocationRequest` directly stores `&StoreOpaque` and
  `Option<&mut StoreResourceLimiter>` meaning it's trivial to acquire
  them. This additionally means no more `unsafe` access of the store
  during instance allocation (yay!).
* Now-redundant fields of `InstanceAllocationRequest` were removed since
  they can be safely inferred from `&StoreOpaque`. For example passing
  around `&Tunables` is now all gone.
* Methods upwards from table/memory allocation to the
  `InstanceAllocator` trait needed to be made `async`. This includes new
  `#[async_trait]` methods for example.
* `StoreOpaque::ensure_gc_store` is now an `async` function. This
  internally carries a new `unsafe` block carried over from before with
  the raw point passed around in `InstanceAllocationRequest`. A future
  PR will delete this `unsafe` block, it's just temporary.

I attempted a few times to split this PR up into separate commits but
everything is relatively intertwined here so this is the smallest
"atomic" unit I could manage to land these changes and refactorings.

* Shuffle `async-trait` dep

* Fix configured build
@alexcrichton
Copy link
Member Author

Ok through all the various PRs above this PR is now entirely obsolete. All the benefits of this are on main, yay!

There's a 5% performance regression on main relative to when I started this work which is due to #[async_trait] making boxed futures. Otherwise though I think it all worked out well!

@tschneidereit
Copy link
Member

There's a 5% performance regression on main relative to when I started this work which is due to #[async_trait] making boxed futures.

Can you say more about what kinds of things regressed? Or is this just "everything is pretty uniformly 5% slower"?

And separately, is there anything we can do to claw this back? And if so, can we track that somewhere?

@alexcrichton
Copy link
Member Author

Throughout this work I was watching the sequential/pooling/(spidermonkey|wasi).wasm benchmark defined in benches/instantiation.rs in this repo. I copied spidermonkey.wasm from Sightglass and otherwise this benchmark repeatedly instantiates in a loop these wasm modules. The 5% regression was time-to-instantiate-and-tear-down-the-store as measured by Criterion. Numbers were in the ~2us range for both modules and the 5% regression was on that number as well.

#11470 was the cause of this change and in profiling and analyzing that my conclusion was it's more-or-less entirely due to #[async_trait]. Previously where we had only dynamic dispatch we now have dynamic dispatch plus heap-allocated futures. The extra heap allocation was what was showing up in the profile primarily different from before. Effectively each table and memory being allocated now requires a heap-allocated future to track the state of progressing through the allocation there.

I don't really know of a great way to claw back this performance easily. One option is to way for dyn-compatible async traits in Rust, but that's likely to take awhile. Another option is to possibly have both an async and a sync trait method and we dynamically select which one depending on the resource limiter that's been configured. For the small wins here though I'd say that's probably not worth it, personally. Given the scale of the numbers here and the micro-benchmark nature I also wasn't planning on tracking this since we generally just try to get instantiation as fast as possible as opposed to "must be below this threshold at all times". In that sense it's a larger constant-factor than before, but that's naturally going to fluctuate over time IMO

@tschneidereit
Copy link
Member

Thank you, that's very helpful. I was mildly concerned because I thought you were talking about everything being 5% slower. If it's just instantiation (and I now remember you mentioning this earlier), not e.g. execution throughput, then that's much less concerning. I think that all seems fine, then.

bongjunj pushed a commit to prosyslab/wasmtime that referenced this pull request Oct 20, 2025
* Make table growth a true `async fn`

Upon further refactoring and thinking about bytecodealliance#11430 I've realized that we
might be able to sidestep `T: Send` on the store entirely which would be
quite the boon if it can be pulled off. The realization I had is that
the main reason for this was `&mut dyn VMStore` on the stack, but that
itself is actually a bug in Wasmtime (bytecodealliance#11178) and shouldn't be done.
The functions which have this on the stack should actually ONLY have the
resource limiter, if configured. This means that while the
`ResourceLimiter{,Async}` traits need a `Send` supertrait that's
relatively easy to add without much impact. My hunch is that plumbing
this through to the end will enable all the benefits of bytecodealliance#11430 without
requiring adding `T: Send` to the store.

This commit starts out on this journey by making table growth a true
`async fn`. A new internal type is added to represent a store's limiter
which is plumbed to growth functions. This represents a hierarchy of
borrows that look like:

* `StoreInner<T>`
  * `StoreResourceLimiter<'_>`
  * `StoreOpaque`
    * `Pin<&mut Instance>`
      * `&mut vm::Table`

This notably, safely, allows operating on `vm::Table` with a
`StoreResourceLimiter` at the same time. This is exactly what's needed
and prevents needing to have `&mut dyn VMStore`, the previous argument,
on the stack.

This refactoring cleans up `unsafe` blocks in table growth right
now which manually uses raw pointers to work around the borrow checker.
No more now!

I'll note as well that this is just an incremental step. What I plan on
doing next is handling other locations like memory growth, memory
allocation, and table allocation. Each of those will require further
refactorings to ensure that things like GC are correctly accounted for
so they're going to be split into separate PRs. Functionally though this
PR should have no impact other than a fiber is no longer required for
`Table::grow_async`.

* Remove #[cfg] gate
bongjunj pushed a commit to prosyslab/wasmtime that referenced this pull request Oct 20, 2025
* Make memory growth an `async` function

This is an analog of bytecodealliance#11442 but for memories. This had a little more
impact due to memories being hooked into GC operations. Further
refactoring of GC operations to make them safer/more-async is deferred
to a future PR and for now it's "no worse than before". This is another
step towards bytecodealliance#11430 and enables removing a longstanding `unsafe` block
in `runtime/memory.rs` which previously could not be removed.

One semantic change from this is that growth of a shared memory no
longer uses an async limiter. This is done to keep growth of a shared
memory consistent with creation of a shared memory where no limits are
applied. This is due to the cross-store nature of shared memories which
means that we can't tie growth to any one particular store. This
additionally fixes an issue where an rwlock write guard was otherwise
held across a `.await` point which creates a non-`Send` future, closing
a possible soundness hole in Wasmtime.

* Fix threads-disabled build

* Review comments
bongjunj pushed a commit to prosyslab/wasmtime that referenced this pull request Oct 20, 2025
Forgotten from bytecodealliance#11459 and extracted from bytecodealliance#11430, uses an RAII guard
instead of a closure to handle errors.
bongjunj pushed a commit to prosyslab/wasmtime that referenced this pull request Oct 20, 2025
* Make const-expr evaluation `async`

This commit is extracted from bytecodealliance#11430 to accurately reflect how
const-expr evaluation is an async operation due to GC pauses that may
happen. The changes in this commit are:

* Const-expr evaluation is, at its core, now an `async` function.
* To leverage this new `async`-ness all internal operations are switched
  from `*_maybe_async` to `*_async` meaning all the `*_maybe_async`
  methods can be removed.
* Some libcalls using `*_maybe_async` are switch to using `*_async` plus
  the `block_on!` utility to help jettison more `*_maybe_async` methods.
* Instance initialization is now an `async` function. This is
  temporarily handled with `block_on` during instance initialization to
  avoid propagating the `async`-ness further upwards. This `block_on`
  will get deleted in future refactorings.
* Const-expr evaluation has been refactored slightly to enable having a
  fast path in global initialization which skips an `await` point
  entirely, achieving performance-parity in benchmarks prior to this commit.

This ended up fixing a niche issue with GC where if a wasm execution was
suspended during `table.init`, for example, during a const-expr
evaluation triggering a GC then if the wasm execution was cancelled it
would panic the host. This panic was because the GC operation returned
`Result` but it was `unwrap`'d as part of the const-expr evaluation
which can fail not only to invalid-ness but also due to "computation is
cancelled" traps.

* Fix configured build

* Undo rebase mistake
bongjunj pushed a commit to prosyslab/wasmtime that referenced this pull request Oct 20, 2025
* Make core instance allocation an `async` function

This commit is a step in preparation for bytecodealliance#11430, notably core instance
allocation, or `StoreOpaque::allocate_instance` is now an `async fn`.
This function does not actually use the `async`-ness just yet so it's a
noop from that point of view, but this propagates outwards to enough
locations that I wanted to split this off to make future changes more
digestable.

Notably some creation functions here such as making an `Instance`,
`Table`, or `Memory` are refactored internally to use this new `async`
function. Annotations of `assert_ready` or `one_poll` are used as
appropriate as well.

For reference this commit was benchmarked with our `instantiation.rs`
benchmark in the pooling allocator and shows no changes relative to the
original baseline from before-`async`-PRs.

* Make table/memory creation `async` functions

This commit is a large-ish refactor which is made possible by the many
previous refactorings to internals w.r.t. async-in-Wasmtime. The end
goal of this change is that table and memory allocation are both `async`
functions. Achieving this, however, required some refactoring to enable
it to work:

* To work with `Send` neither function can close over `dyn VMStore`.
  This required changing their `Option<&mut dyn VMStore>` arugment to
  `Option<&mut StoreResourceLimiter<'_>>`
* Somehow a `StoreResourceLimiter` needed to be acquired from an
  `InstanceAllocationRequest`. Previously the store was stored here as
  an unsafe raw pointer, but I've refactored this now so
  `InstanceAllocationRequest` directly stores `&StoreOpaque` and
  `Option<&mut StoreResourceLimiter>` meaning it's trivial to acquire
  them. This additionally means no more `unsafe` access of the store
  during instance allocation (yay!).
* Now-redundant fields of `InstanceAllocationRequest` were removed since
  they can be safely inferred from `&StoreOpaque`. For example passing
  around `&Tunables` is now all gone.
* Methods upwards from table/memory allocation to the
  `InstanceAllocator` trait needed to be made `async`. This includes new
  `#[async_trait]` methods for example.
* `StoreOpaque::ensure_gc_store` is now an `async` function. This
  internally carries a new `unsafe` block carried over from before with
  the raw point passed around in `InstanceAllocationRequest`. A future
  PR will delete this `unsafe` block, it's just temporary.

I attempted a few times to split this PR up into separate commits but
everything is relatively intertwined here so this is the smallest
"atomic" unit I could manage to land these changes and refactorings.

* Shuffle `async-trait` dep

* Fix configured build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants