-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
In the spirit of making Wasmtime more sound internally over time this is a major thorn in our side right now. Many functions in Wasmtime are synchronous but they're actually asynchronous in nature due to the possibility that something could be context-switched away at any moment. These context switches are required to happen on a fiber and while I'm not aware of any issues internally within Wasmtime, this is, in my opinion, unnecessarily surprising. I would ideally like to move towards a world where Wasmtime actually uses async internally, not just for the async feature of wasm itself, and then in synchronous mode it asserts the future is always ready since there aren't any sources of blocking async.
Concretely sources of surprising async-ness are:
- Async resource limiters
- Async GC
- Async call hooks
- ... maybe more? this isn't intended to be exhaustive
Concretely one location of unsoundness has to do with growth of tables and memories. For example memory growth requires a split borrow both of and within the store. The same is true for tables. This is due to the fact that the methods of growth require a &mut self on the runtime data structure but also require, effectively, &mut StoreOpaque to suspend the world. Given that memories/tables live within &mut StoreOpaque, this is not a sound way to model this problem.
Another concrete location of unsoundness is that growth of a shared memory suspends the stack with an rwlock write lock on the stack. This means that unsafe impl Send for FiberFuture {} is a lie because that's foundationally built on the premise that Wasmtime's own stack-local variables are all Send. This is effectively something that's impossible to guarantee and this hypothetically can result in locking an rwlock on one thread and unlocking it on another (technically UB, tested on glibc and other platforms to at least not behave badly for now). I was able to trivially detect this by migrating to async fn, however, as the compiler checks the send-ness of local variables.
If hypothetically growth were to use an async fn I believe this would be sound. For example growth still requires a limiter, but there's no need to borrow the full store. This is morally a borrow of separate fields in the store, for example (a) the data/limiter and (b) the memory itself. That's sound to model in Wasmtime (albeit clunky).
There are additional downsides to the way this is all implemented today such as async table/memory growth requiring a fiber. This doesn't actually execute any WebAssembly code yet we nonetheless require a fiber to satisfy how suspension works in an async context. This is, in my opinion, unnecessarily unsafe where we're suspending Rust frames on a stack when we don't really need to if things were actually modeled as an async function.
I'll note that this issue is not going to be easy to solve. We can't "simply go sprinkle some async" as there are some fundamental blockers to that. The main one I ran into while attempting this is that VMStore is not Send. This is a trait object backed by any Store<T> for any T meaning that we don't know it's Send ahead of time. While everything in an async context is Send we don't know that for a sync context. I know of no fix for this other than adding Store<T: Send> everywhere, but that's a very heavy hammer and is not one we should reach for lightly.