-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Use async fn internally within Wasmtime
#11430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use async fn internally within Wasmtime
#11430
Conversation
This is done in preparation for the next commit where `async` is plumbed more pervasively throughout the internals of Wasmtime. In doing so it'll require `dyn VMStore: Send` which will only be possible where `T: Send` in `Store<T>`.
This commit is an initial step towards resolving bytecodealliance#11262 by having async functions internally Wasmtime actually be `async` instead of requiring the use of fibers. This is expected to have a number of benefits: * The Rust compiler can be used to verify a future is `Send` instead of "please audit the whole codebase's stack-local variables". * Raw pointer workarounds during table/memory growth will no longer be required since the arguments can properly be a split borrow to data in the store (eventually leading to unblocking bytecodealliance#11178). * Less duplication inside of Wasmtime and clearer implementations internally. For example GC bits prior to this PR has duplicated sync/async entrypoints (sometimes a few layers deep) which eventually bottomed out in `*_maybe_async` bits which were `unsafe` and require fiber bits to be setup. All of that is now gone with the `async` functions being the "source of truth" and sync functions just call them. * Fibers are not required for operations such as a GC, growing memory, etc. The general idea is that the source of truth for the implementation of Wasmtime internals are all `async` functions. These functions are callable from synchronous functions in the API with a documented panic condition about avoiding them when `Config::async_support` is disabled. When `async_support` is disabled it's known internally there should never be an `.await` point meaning that we can poll the future of the async version and assert that it's ready. This commit is not the full realization of plumbing `async` everywhere internally in Wasmtime. Instead all this does is plumb the async-ness of `ResourceLimiterAsync` and that's it, aka memory and table growth are now properly async. It turns out though that these limiters are extremely deep within Wasmtime and thus necessitated many changes to get this all working. In the end this ended up covering some of the trickier parts of dealing with async and propagating that throughout the runtime. Most changes in this commit are intended to be straightforward, but a summary is: * Many more functions are `async` and `.await` their internals. * Some instances of run-a-closure-and-catch-the-error are now replaced with type-with-`Drop` as that's the equivalent in the async world. * Internal traits in Wasmtime are now `#[async_trait]` to be object safe. This has a performance impact detailed more below. * `vm::assert_ready` is used in synchronous contexts to assert that the async version is done immediately. This is intended to always be accompanied with an assert about `async_support` nearby. * `vm::one_poll` is used test if an asynchronous computation is ready yet and is used in a few locations where a synchronous public API says it'll work in `async_support` mode but fails with an async resource limiter. * GC and other internals were simplified where `async` functions are now the "guts" and sync functions are thin veneers over these `async` functions. * An example of new async functions are that lazy GC store allocation and instance allocation are both async functions now. * In a small number of locations a conditional check of `store.async_support()` is done. For example during GC if `async_support` is enabled arbitrary yield points are injected. For libcalls if it's enabled `block_on` is used or otherwise it's asserted to complete synchronously. * Previously `unsafe` functions dealing requiring external fiber handling are now all safe and `async`. * Libcalls have a `block_on!` helper macro which should be itself a function-taking-async-closure but requires future Rust features to make it a function. A consequence of this refactoring is that instantiation is now slower than before. For example from our `instantiation.rs` benchmark: ``` sequential/pooling/spidermonkey.wasm time: [2.6674 µs 2.6691 µs 2.6718 µs] change: [+20.975% +21.039% +21.111%] (p = 0.00 < 0.05) Performance has regressed. ``` Other benchmarks I've been looking at locally in `instantiation.rs` have pretty wild swings from either a performance improvement in this PR of 10% to a regression of 20%. This benchmark in particular though, also one of the more interesting ones, is consistently 20% slower with this commit. Attempting to bottom out this performance difference it looks like it's largely "just async state machines vs not" where nothing else really jumps out in the profile to me. In terms of absolute numbers the time-to-instantiate is still in the single-digit-microsecond range with `madvise` being the dominant function. prtest:full
|
I'm starting this as a draft for now while I sort out CI things, but I also want to have some discussion about this ideally before landing. I plan on bringing this up in tomorrow's Wasmtime meeting. |
Adding bytecodealliance/wasmtime#11430 as a discussion point
IIUC, that means the overhead is a fixed cost that should be stable across different module types, as opposed to somehow scaling with something about the module type itself? If so, that seems not ideal but okay to me, given that we're talking about 0.5us. Otherwise I'd like to understand the implications a bit more. |
|
For posterity, in today's Wasmtime meeting, we discussed this PR and ways to claw back some of the perf regression. The primary option we discussed was using Cranelift to compile a state-initialization function, which we have on file as #2639 |
Upon further refactoring and thinking about bytecodealliance#11430 I've realized that we might be able to sidestep `T: Send` on the store entirely which would be quite the boon if it can be pulled off. The realization I had is that the main reason for this was `&mut dyn VMStore` on the stack, but that itself is actually a bug in Wasmtime (bytecodealliance#11178) and shouldn't be done. The functions which have this on the stack should actually ONLY have the resource limiter, if configured. This means that while the `ResourceLimiter{,Async}` traits need a `Send` supertrait that's relatively easy to add without much impact. My hunch is that plumbing this through to the end will enable all the benefits of bytecodealliance#11430 without requiring adding `T: Send` to the store. This commit starts out on this journey by making table growth a true `async fn`. A new internal type is added to represent a store's limiter which is plumbed to growth functions. This represents a hierarchy of borrows that look like: * `StoreInner<T>` * `StoreResourceLimiter<'_>` * `StoreOpaque` * `Pin<&mut Instance>` * `&mut vm::Table` This notably, safely, allows operating on `vm::Table` with a `StoreResourceLimiter` at the same time. This is exactly what's needed and prevents needing to have `&mut dyn VMStore`, the previous argument, on the stack. This refactoring cleans up `unsafe` blocks in table growth right now which manually uses raw pointers to work around the borrow checker. No more now! I'll note as well that this is just an incremental step. What I plan on doing next is handling other locations like memory growth, memory allocation, and table allocation. Each of those will require further refactorings to ensure that things like GC are correctly accounted for so they're going to be split into separate PRs. Functionally though this PR should have no impact other than a fiber is no longer required for `Table::grow_async`.
|
Ok I've done some more performance profiling an analysis of this. After more thinking and more optimizing, I think I've got an idea for a design that is cheaper at runtime as well as doesn't require |
* Make table growth a true `async fn` Upon further refactoring and thinking about #11430 I've realized that we might be able to sidestep `T: Send` on the store entirely which would be quite the boon if it can be pulled off. The realization I had is that the main reason for this was `&mut dyn VMStore` on the stack, but that itself is actually a bug in Wasmtime (#11178) and shouldn't be done. The functions which have this on the stack should actually ONLY have the resource limiter, if configured. This means that while the `ResourceLimiter{,Async}` traits need a `Send` supertrait that's relatively easy to add without much impact. My hunch is that plumbing this through to the end will enable all the benefits of #11430 without requiring adding `T: Send` to the store. This commit starts out on this journey by making table growth a true `async fn`. A new internal type is added to represent a store's limiter which is plumbed to growth functions. This represents a hierarchy of borrows that look like: * `StoreInner<T>` * `StoreResourceLimiter<'_>` * `StoreOpaque` * `Pin<&mut Instance>` * `&mut vm::Table` This notably, safely, allows operating on `vm::Table` with a `StoreResourceLimiter` at the same time. This is exactly what's needed and prevents needing to have `&mut dyn VMStore`, the previous argument, on the stack. This refactoring cleans up `unsafe` blocks in table growth right now which manually uses raw pointers to work around the borrow checker. No more now! I'll note as well that this is just an incremental step. What I plan on doing next is handling other locations like memory growth, memory allocation, and table allocation. Each of those will require further refactorings to ensure that things like GC are correctly accounted for so they're going to be split into separate PRs. Functionally though this PR should have no impact other than a fiber is no longer required for `Table::grow_async`. * Remove #[cfg] gate
This commit is a step in preparation for bytecodealliance#11430, notably core instance allocation, or `StoreOpaque::allocate_instance` is now an `async fn`. This function does not actually use the `async`-ness just yet so it's a noop from that point of view, but this propagates outwards to enough locations that I wanted to split this off to make future changes more digestable. Notably some creation functions here such as making an `Instance`, `Table`, or `Memory` are refactored internally to use this new `async` function. Annotations of `assert_ready` or `one_poll` are used as appropriate as well. For reference this commit was benchmarked with our `instantiation.rs` benchmark in the pooling allocator and shows no changes relative to the original baseline from before-`async`-PRs.
This commit is a step in preparation for bytecodealliance#11430, notably core instance allocation, or `StoreOpaque::allocate_instance` is now an `async fn`. This function does not actually use the `async`-ness just yet so it's a noop from that point of view, but this propagates outwards to enough locations that I wanted to split this off to make future changes more digestable. Notably some creation functions here such as making an `Instance`, `Table`, or `Memory` are refactored internally to use this new `async` function. Annotations of `assert_ready` or `one_poll` are used as appropriate as well. For reference this commit was benchmarked with our `instantiation.rs` benchmark in the pooling allocator and shows no changes relative to the original baseline from before-`async`-PRs.
This commit is a step in preparation for bytecodealliance#11430, notably core instance allocation, or `StoreOpaque::allocate_instance` is now an `async fn`. This function does not actually use the `async`-ness just yet so it's a noop from that point of view, but this propagates outwards to enough locations that I wanted to split this off to make future changes more digestable. Notably some creation functions here such as making an `Instance`, `Table`, or `Memory` are refactored internally to use this new `async` function. Annotations of `assert_ready` or `one_poll` are used as appropriate as well. For reference this commit was benchmarked with our `instantiation.rs` benchmark in the pooling allocator and shows no changes relative to the original baseline from before-`async`-PRs.
This is an analog of bytecodealliance#11442 but for memories. This had a little more impact due to memories being hooked into GC operations. Further refactoring of GC operations to make them safer/more-async is deferred to a future PR and for now it's "no worse than before". This is another step towards bytecodealliance#11430 and enables removing a longstanding `unsafe` block in `runtime/memory.rs` which previously could not be removed. One semantic change from this is that growth of a shared memory no longer uses an async limiter. This is done to keep growth of a shared memory consistent with creation of a shared memory where no limits are applied. This is due to the cross-store nature of shared memories which means that we can't tie growth to any one particular store. This additionally fixes an issue where an rwlock write guard was otherwise held across a `.await` point which creates a non-`Send` future, closing a possible soundness hole in Wasmtime.
This is an analog of bytecodealliance#11442 but for memories. This had a little more impact due to memories being hooked into GC operations. Further refactoring of GC operations to make them safer/more-async is deferred to a future PR and for now it's "no worse than before". This is another step towards bytecodealliance#11430 and enables removing a longstanding `unsafe` block in `runtime/memory.rs` which previously could not be removed. One semantic change from this is that growth of a shared memory no longer uses an async limiter. This is done to keep growth of a shared memory consistent with creation of a shared memory where no limits are applied. This is due to the cross-store nature of shared memories which means that we can't tie growth to any one particular store. This additionally fixes an issue where an rwlock write guard was otherwise held across a `.await` point which creates a non-`Send` future, closing a possible soundness hole in Wasmtime.
This is an analog of bytecodealliance#11442 but for memories. This had a little more impact due to memories being hooked into GC operations. Further refactoring of GC operations to make them safer/more-async is deferred to a future PR and for now it's "no worse than before". This is another step towards bytecodealliance#11430 and enables removing a longstanding `unsafe` block in `runtime/memory.rs` which previously could not be removed. One semantic change from this is that growth of a shared memory no longer uses an async limiter. This is done to keep growth of a shared memory consistent with creation of a shared memory where no limits are applied. This is due to the cross-store nature of shared memories which means that we can't tie growth to any one particular store. This additionally fixes an issue where an rwlock write guard was otherwise held across a `.await` point which creates a non-`Send` future, closing a possible soundness hole in Wasmtime.
Forgotten from bytecodealliance#11459 and extracted from bytecodealliance#11430, uses an RAII guard instead of a closure to handle errors.
* Make memory growth an `async` function This is an analog of #11442 but for memories. This had a little more impact due to memories being hooked into GC operations. Further refactoring of GC operations to make them safer/more-async is deferred to a future PR and for now it's "no worse than before". This is another step towards #11430 and enables removing a longstanding `unsafe` block in `runtime/memory.rs` which previously could not be removed. One semantic change from this is that growth of a shared memory no longer uses an async limiter. This is done to keep growth of a shared memory consistent with creation of a shared memory where no limits are applied. This is due to the cross-store nature of shared memories which means that we can't tie growth to any one particular store. This additionally fixes an issue where an rwlock write guard was otherwise held across a `.await` point which creates a non-`Send` future, closing a possible soundness hole in Wasmtime. * Fix threads-disabled build * Review comments
This commit is extracted from bytecodealliance#11430 to accurately reflect how const-expr evaluation is an async operation due to GC pauses that may happen. The changes in this commit are: * Const-expr evaluation is, at its core, now an `async` function. * To leverage this new `async`-ness all internal operations are switched from `*_maybe_async` to `*_async` meaning all the `*_maybe_async` methods can be removed. * Some libcalls using `*_maybe_async` are switch to using `*_async` plus the `block_on!` utility to help jettison more `*_maybe_async` methods. * Instance initialization is now an `async` function. This is temporarily handled with `block_on` during instance initialization to avoid propagating the `async`-ness further upwards. This `block_on` will get deleted in future refactorings. * Const-expr evaluation has been refactored slightly to enable having a fast path in global initialization which skips an `await` point entirely, achieving performance-parity in benchmarks prior to this commit. This ended up fixing a niche issue with GC where if a wasm execution was suspended during `table.init`, for example, during a const-expr evaluation triggering a GC then if the wasm execution was cancelled it would panic the host. This panic was because the GC operation returned `Result` but it was `unwrap`'d as part of the const-expr evaluation which can fail not only to invalid-ness but also due to "computation is cancelled" traps.
|
Further work/investigation on #11468 revealed an optimization opportunity I was not aware of, but makes sense in retrospect: in an With #11468 there's no performance regression currently. That's not the complete story but I'm growing confident we can land this PR without |
This commit is a step in preparation for bytecodealliance#11430, notably core instance allocation, or `StoreOpaque::allocate_instance` is now an `async fn`. This function does not actually use the `async`-ness just yet so it's a noop from that point of view, but this propagates outwards to enough locations that I wanted to split this off to make future changes more digestable. Notably some creation functions here such as making an `Instance`, `Table`, or `Memory` are refactored internally to use this new `async` function. Annotations of `assert_ready` or `one_poll` are used as appropriate as well. For reference this commit was benchmarked with our `instantiation.rs` benchmark in the pooling allocator and shows no changes relative to the original baseline from before-`async`-PRs.
* Make const-expr evaluation `async` This commit is extracted from #11430 to accurately reflect how const-expr evaluation is an async operation due to GC pauses that may happen. The changes in this commit are: * Const-expr evaluation is, at its core, now an `async` function. * To leverage this new `async`-ness all internal operations are switched from `*_maybe_async` to `*_async` meaning all the `*_maybe_async` methods can be removed. * Some libcalls using `*_maybe_async` are switch to using `*_async` plus the `block_on!` utility to help jettison more `*_maybe_async` methods. * Instance initialization is now an `async` function. This is temporarily handled with `block_on` during instance initialization to avoid propagating the `async`-ness further upwards. This `block_on` will get deleted in future refactorings. * Const-expr evaluation has been refactored slightly to enable having a fast path in global initialization which skips an `await` point entirely, achieving performance-parity in benchmarks prior to this commit. This ended up fixing a niche issue with GC where if a wasm execution was suspended during `table.init`, for example, during a const-expr evaluation triggering a GC then if the wasm execution was cancelled it would panic the host. This panic was because the GC operation returned `Result` but it was `unwrap`'d as part of the const-expr evaluation which can fail not only to invalid-ness but also due to "computation is cancelled" traps. * Fix configured build * Undo rebase mistake
This commit is a step in preparation for bytecodealliance#11430, notably core instance allocation, or `StoreOpaque::allocate_instance` is now an `async fn`. This function does not actually use the `async`-ness just yet so it's a noop from that point of view, but this propagates outwards to enough locations that I wanted to split this off to make future changes more digestable. Notably some creation functions here such as making an `Instance`, `Table`, or `Memory` are refactored internally to use this new `async` function. Annotations of `assert_ready` or `one_poll` are used as appropriate as well. For reference this commit was benchmarked with our `instantiation.rs` benchmark in the pooling allocator and shows no changes relative to the original baseline from before-`async`-PRs.
This commit is extracted from bytecodealliance#11430 to accurately reflect how const-expr evaluation is an async operation due to GC pauses that may happen. The changes in this commit are: * Const-expr evaluation is, at its core, now an `async` function. * To leverage this new `async`-ness all internal operations are switched from `*_maybe_async` to `*_async` meaning all the `*_maybe_async` methods can be removed. * Some libcalls using `*_maybe_async` are switch to using `*_async` plus the `block_on!` utility to help jettison more `*_maybe_async` methods. * Instance initialization is now an `async` function. This is temporarily handled with `block_on` during instance initialization to avoid propagating the `async`-ness further upwards. This `block_on` will get deleted in future refactorings. * Const-expr evaluation has been refactored slightly to enable having a fast path in global initialization which skips an `await` point entirely, achieving performance-parity in benchmarks prior to this commit. This ended up fixing a niche issue with GC where if a wasm execution was suspended during `table.init`, for example, during a const-expr evaluation triggering a GC then if the wasm execution was cancelled it would panic the host. This panic was because the GC operation returned `Result` but it was `unwrap`'d as part of the const-expr evaluation which can fail not only to invalid-ness but also due to "computation is cancelled" traps.
This commit is a step in preparation for bytecodealliance#11430, notably core instance allocation, or `StoreOpaque::allocate_instance` is now an `async fn`. This function does not actually use the `async`-ness just yet so it's a noop from that point of view, but this propagates outwards to enough locations that I wanted to split this off to make future changes more digestable. Notably some creation functions here such as making an `Instance`, `Table`, or `Memory` are refactored internally to use this new `async` function. Annotations of `assert_ready` or `one_poll` are used as appropriate as well. For reference this commit was benchmarked with our `instantiation.rs` benchmark in the pooling allocator and shows no changes relative to the original baseline from before-`async`-PRs.
This commit is extracted from bytecodealliance#11430 to accurately reflect how const-expr evaluation is an async operation due to GC pauses that may happen. The changes in this commit are: * Const-expr evaluation is, at its core, now an `async` function. * To leverage this new `async`-ness all internal operations are switched from `*_maybe_async` to `*_async` meaning all the `*_maybe_async` methods can be removed. * Some libcalls using `*_maybe_async` are switch to using `*_async` plus the `block_on!` utility to help jettison more `*_maybe_async` methods. * Instance initialization is now an `async` function. This is temporarily handled with `block_on` during instance initialization to avoid propagating the `async`-ness further upwards. This `block_on` will get deleted in future refactorings. * Const-expr evaluation has been refactored slightly to enable having a fast path in global initialization which skips an `await` point entirely, achieving performance-parity in benchmarks prior to this commit. This ended up fixing a niche issue with GC where if a wasm execution was suspended during `table.init`, for example, during a const-expr evaluation triggering a GC then if the wasm execution was cancelled it would panic the host. This panic was because the GC operation returned `Result` but it was `unwrap`'d as part of the const-expr evaluation which can fail not only to invalid-ness but also due to "computation is cancelled" traps.
This commit is a step in preparation for bytecodealliance#11430, notably core instance allocation, or `StoreOpaque::allocate_instance` is now an `async fn`. This function does not actually use the `async`-ness just yet so it's a noop from that point of view, but this propagates outwards to enough locations that I wanted to split this off to make future changes more digestable. Notably some creation functions here such as making an `Instance`, `Table`, or `Memory` are refactored internally to use this new `async` function. Annotations of `assert_ready` or `one_poll` are used as appropriate as well. For reference this commit was benchmarked with our `instantiation.rs` benchmark in the pooling allocator and shows no changes relative to the original baseline from before-`async`-PRs.
This commit is a step in preparation for bytecodealliance#11430, notably core instance allocation, or `StoreOpaque::allocate_instance` is now an `async fn`. This function does not actually use the `async`-ness just yet so it's a noop from that point of view, but this propagates outwards to enough locations that I wanted to split this off to make future changes more digestable. Notably some creation functions here such as making an `Instance`, `Table`, or `Memory` are refactored internally to use this new `async` function. Annotations of `assert_ready` or `one_poll` are used as appropriate as well. For reference this commit was benchmarked with our `instantiation.rs` benchmark in the pooling allocator and shows no changes relative to the original baseline from before-`async`-PRs.
* Make core instance allocation an `async` function This commit is a step in preparation for #11430, notably core instance allocation, or `StoreOpaque::allocate_instance` is now an `async fn`. This function does not actually use the `async`-ness just yet so it's a noop from that point of view, but this propagates outwards to enough locations that I wanted to split this off to make future changes more digestable. Notably some creation functions here such as making an `Instance`, `Table`, or `Memory` are refactored internally to use this new `async` function. Annotations of `assert_ready` or `one_poll` are used as appropriate as well. For reference this commit was benchmarked with our `instantiation.rs` benchmark in the pooling allocator and shows no changes relative to the original baseline from before-`async`-PRs. * Make table/memory creation `async` functions This commit is a large-ish refactor which is made possible by the many previous refactorings to internals w.r.t. async-in-Wasmtime. The end goal of this change is that table and memory allocation are both `async` functions. Achieving this, however, required some refactoring to enable it to work: * To work with `Send` neither function can close over `dyn VMStore`. This required changing their `Option<&mut dyn VMStore>` arugment to `Option<&mut StoreResourceLimiter<'_>>` * Somehow a `StoreResourceLimiter` needed to be acquired from an `InstanceAllocationRequest`. Previously the store was stored here as an unsafe raw pointer, but I've refactored this now so `InstanceAllocationRequest` directly stores `&StoreOpaque` and `Option<&mut StoreResourceLimiter>` meaning it's trivial to acquire them. This additionally means no more `unsafe` access of the store during instance allocation (yay!). * Now-redundant fields of `InstanceAllocationRequest` were removed since they can be safely inferred from `&StoreOpaque`. For example passing around `&Tunables` is now all gone. * Methods upwards from table/memory allocation to the `InstanceAllocator` trait needed to be made `async`. This includes new `#[async_trait]` methods for example. * `StoreOpaque::ensure_gc_store` is now an `async` function. This internally carries a new `unsafe` block carried over from before with the raw point passed around in `InstanceAllocationRequest`. A future PR will delete this `unsafe` block, it's just temporary. I attempted a few times to split this PR up into separate commits but everything is relatively intertwined here so this is the smallest "atomic" unit I could manage to land these changes and refactorings. * Shuffle `async-trait` dep * Fix configured build
|
Ok through all the various PRs above this PR is now entirely obsolete. All the benefits of this are on There's a 5% performance regression on |
Can you say more about what kinds of things regressed? Or is this just "everything is pretty uniformly 5% slower"? And separately, is there anything we can do to claw this back? And if so, can we track that somewhere? |
|
Throughout this work I was watching the #11470 was the cause of this change and in profiling and analyzing that my conclusion was it's more-or-less entirely due to I don't really know of a great way to claw back this performance easily. One option is to way for |
|
Thank you, that's very helpful. I was mildly concerned because I thought you were talking about everything being 5% slower. If it's just instantiation (and I now remember you mentioning this earlier), not e.g. execution throughput, then that's much less concerning. I think that all seems fine, then. |
* Make table growth a true `async fn` Upon further refactoring and thinking about bytecodealliance#11430 I've realized that we might be able to sidestep `T: Send` on the store entirely which would be quite the boon if it can be pulled off. The realization I had is that the main reason for this was `&mut dyn VMStore` on the stack, but that itself is actually a bug in Wasmtime (bytecodealliance#11178) and shouldn't be done. The functions which have this on the stack should actually ONLY have the resource limiter, if configured. This means that while the `ResourceLimiter{,Async}` traits need a `Send` supertrait that's relatively easy to add without much impact. My hunch is that plumbing this through to the end will enable all the benefits of bytecodealliance#11430 without requiring adding `T: Send` to the store. This commit starts out on this journey by making table growth a true `async fn`. A new internal type is added to represent a store's limiter which is plumbed to growth functions. This represents a hierarchy of borrows that look like: * `StoreInner<T>` * `StoreResourceLimiter<'_>` * `StoreOpaque` * `Pin<&mut Instance>` * `&mut vm::Table` This notably, safely, allows operating on `vm::Table` with a `StoreResourceLimiter` at the same time. This is exactly what's needed and prevents needing to have `&mut dyn VMStore`, the previous argument, on the stack. This refactoring cleans up `unsafe` blocks in table growth right now which manually uses raw pointers to work around the borrow checker. No more now! I'll note as well that this is just an incremental step. What I plan on doing next is handling other locations like memory growth, memory allocation, and table allocation. Each of those will require further refactorings to ensure that things like GC are correctly accounted for so they're going to be split into separate PRs. Functionally though this PR should have no impact other than a fiber is no longer required for `Table::grow_async`. * Remove #[cfg] gate
* Make memory growth an `async` function This is an analog of bytecodealliance#11442 but for memories. This had a little more impact due to memories being hooked into GC operations. Further refactoring of GC operations to make them safer/more-async is deferred to a future PR and for now it's "no worse than before". This is another step towards bytecodealliance#11430 and enables removing a longstanding `unsafe` block in `runtime/memory.rs` which previously could not be removed. One semantic change from this is that growth of a shared memory no longer uses an async limiter. This is done to keep growth of a shared memory consistent with creation of a shared memory where no limits are applied. This is due to the cross-store nature of shared memories which means that we can't tie growth to any one particular store. This additionally fixes an issue where an rwlock write guard was otherwise held across a `.await` point which creates a non-`Send` future, closing a possible soundness hole in Wasmtime. * Fix threads-disabled build * Review comments
Forgotten from bytecodealliance#11459 and extracted from bytecodealliance#11430, uses an RAII guard instead of a closure to handle errors.
* Make const-expr evaluation `async` This commit is extracted from bytecodealliance#11430 to accurately reflect how const-expr evaluation is an async operation due to GC pauses that may happen. The changes in this commit are: * Const-expr evaluation is, at its core, now an `async` function. * To leverage this new `async`-ness all internal operations are switched from `*_maybe_async` to `*_async` meaning all the `*_maybe_async` methods can be removed. * Some libcalls using `*_maybe_async` are switch to using `*_async` plus the `block_on!` utility to help jettison more `*_maybe_async` methods. * Instance initialization is now an `async` function. This is temporarily handled with `block_on` during instance initialization to avoid propagating the `async`-ness further upwards. This `block_on` will get deleted in future refactorings. * Const-expr evaluation has been refactored slightly to enable having a fast path in global initialization which skips an `await` point entirely, achieving performance-parity in benchmarks prior to this commit. This ended up fixing a niche issue with GC where if a wasm execution was suspended during `table.init`, for example, during a const-expr evaluation triggering a GC then if the wasm execution was cancelled it would panic the host. This panic was because the GC operation returned `Result` but it was `unwrap`'d as part of the const-expr evaluation which can fail not only to invalid-ness but also due to "computation is cancelled" traps. * Fix configured build * Undo rebase mistake
* Make core instance allocation an `async` function This commit is a step in preparation for bytecodealliance#11430, notably core instance allocation, or `StoreOpaque::allocate_instance` is now an `async fn`. This function does not actually use the `async`-ness just yet so it's a noop from that point of view, but this propagates outwards to enough locations that I wanted to split this off to make future changes more digestable. Notably some creation functions here such as making an `Instance`, `Table`, or `Memory` are refactored internally to use this new `async` function. Annotations of `assert_ready` or `one_poll` are used as appropriate as well. For reference this commit was benchmarked with our `instantiation.rs` benchmark in the pooling allocator and shows no changes relative to the original baseline from before-`async`-PRs. * Make table/memory creation `async` functions This commit is a large-ish refactor which is made possible by the many previous refactorings to internals w.r.t. async-in-Wasmtime. The end goal of this change is that table and memory allocation are both `async` functions. Achieving this, however, required some refactoring to enable it to work: * To work with `Send` neither function can close over `dyn VMStore`. This required changing their `Option<&mut dyn VMStore>` arugment to `Option<&mut StoreResourceLimiter<'_>>` * Somehow a `StoreResourceLimiter` needed to be acquired from an `InstanceAllocationRequest`. Previously the store was stored here as an unsafe raw pointer, but I've refactored this now so `InstanceAllocationRequest` directly stores `&StoreOpaque` and `Option<&mut StoreResourceLimiter>` meaning it's trivial to acquire them. This additionally means no more `unsafe` access of the store during instance allocation (yay!). * Now-redundant fields of `InstanceAllocationRequest` were removed since they can be safely inferred from `&StoreOpaque`. For example passing around `&Tunables` is now all gone. * Methods upwards from table/memory allocation to the `InstanceAllocator` trait needed to be made `async`. This includes new `#[async_trait]` methods for example. * `StoreOpaque::ensure_gc_store` is now an `async` function. This internally carries a new `unsafe` block carried over from before with the raw point passed around in `InstanceAllocationRequest`. A future PR will delete this `unsafe` block, it's just temporary. I attempted a few times to split this PR up into separate commits but everything is relatively intertwined here so this is the smallest "atomic" unit I could manage to land these changes and refactorings. * Shuffle `async-trait` dep * Fix configured build
This commit is an initial step towards resolving #11262 by having async
functions internally Wasmtime actually be
asyncinstead of requiringthe use of fibers. This is expected to have a number of benefits:
The Rust compiler can be used to verify a future is
Sendinstead of"please audit the whole codebase's stack-local variables".
Raw pointer workarounds during table/memory growth will no longer be
required since the arguments can properly be a split borrow to data in
the store (eventually leading to unblocking Core WebAssembly libcalls are not sound in Wasmtime #11178).
Less duplication inside of Wasmtime and clearer implementations
internally. For example GC bits prior to this PR has duplicated
sync/async entrypoints (sometimes a few layers deep) which eventually
bottomed out in
*_maybe_asyncbits which wereunsafeand requirefiber bits to be setup. All of that is now gone with the
asyncfunctions being the "source of truth" and sync functions just call
them.
Fibers are not required for operations such as a GC, growing memory,
etc.
The general idea is that the source of truth for the implementation of
Wasmtime internals are all
asyncfunctions. These functions arecallable from synchronous functions in the API with a documented panic
condition about avoiding them when
Config::async_supportis disabled.When
async_supportis disabled it's known internally there shouldnever be an
.awaitpoint meaning that we can poll the future of theasync version and assert that it's ready.
This commit is not the full realization of plumbing
asynceverywhereinternally in Wasmtime. Instead all this does is plumb the async-ness of
ResourceLimiterAsyncand that's it, aka memory and table growth arenow properly async. It turns out though that these limiters are
extremely deep within Wasmtime and thus necessitated many changes to get
this all working. In the end this ended up covering some of the trickier
parts of dealing with async and propagating that throughout the runtime.
Most changes in this commit are intended to be straightforward, but a
summary is:
Many more functions are
asyncand.awaittheir internals.Some instances of run-a-closure-and-catch-the-error are now replaced
with type-with-
Dropas that's the equivalent in the async world.Internal traits in Wasmtime are now
#[async_trait]to be objectsafe. This has a performance impact detailed more below.
vm::assert_readyis used in synchronous contexts to assert that theasync version is done immediately. This is intended to always be
accompanied with an assert about
async_supportnearby.vm::one_pollis used test if an asynchronous computation is readyyet and is used in a few locations where a synchronous public API says
it'll work in
async_supportmode but fails with an async resource limiter.GC and other internals were simplified where
asyncfunctions are nowthe "guts" and sync functions are thin veneers over these
asyncfunctions.An example of new async functions are that lazy GC store allocation
and instance allocation are both async functions now.
In a small number of locations a conditional check of
store.async_support()is done. For example during GC ifasync_supportis enabled arbitrary yield points are injected. Forlibcalls if it's enabled
block_onis used or otherwise it's assertedto complete synchronously.
Previously
unsafefunctions dealing requiring external fiberhandling are now all safe and
async.Libcalls have a
block_on!helper macro which should be itself afunction-taking-async-closure but requires future Rust features to
make it a function.
A consequence of this refactoring is that instantiation is now slower
than before. For example from our
instantiation.rsbenchmark:Other benchmarks I've been looking at locally in
instantiation.rshavepretty wild swings from either a performance improvement in this PR of
10% to a regression of 20%. This benchmark in particular though, also
one of the more interesting ones, is consistently 20% slower with this
commit. Attempting to bottom out this performance difference it looks
like it's largely "just async state machines vs not" where nothing else
really jumps out in the profile to me. In terms of absolute numbers the
time-to-instantiate is still in the single-digit-microsecond range with
madvisebeing the dominant function.