Skip to content

The JSPI call stack sandwich problem #26758

@juj

Description

@juj

This is something we have discussed on video calls before, but I couldn't find a written down ticket about.

Consider the following Emscripten JS library code:

lib.js

mergeInto(LibraryManager.library, {
  // Conveniency wrapper to convert setTimeout() to a Promise API.
  $setTimeout_promise: function(timeout) {
    return new Promise(resolve => {
      setTimeout(() => {
        return resolve();
      }, timeout);
    });
  },

  // Perform an async task (setTimeout) and continue executing WebAssembly
  // after the timeout elapses.
  call_async_function_via_jspi__async: true,
  call_async_function_via_jspi__deps: ['$setTimeout_promise'],
  call_async_function_via_jspi: async function(funcptr) {
    return await setTimeout_promise(1500).then(() => {
      return WebAssembly.promising(getWasmTableEntry(funcptr))();
    });
  },

  // Synchronously call back into WebAssembly.
  call_sync_function_via_js: function(funcptr) {
    return getWasmTableEntry(funcptr)();
  }
});

and this slightly print heavy main program:
main.cpp

#include <stdio.h>

extern "C" int call_async_function_via_jspi(int (*func)(void));
extern "C" int call_sync_function_via_js(int (*func)(void));

int quux()
{
  printf("... ... ... ... ... 6. quux\n");
  return -15;
}

int qux()
{
  printf("... ... ... ... 5. qux, before quux\n");
  int ret = call_sync_function_via_js(&quux);
  printf("... ... ... ... 7. qux, after quux, which returned %d\n", ret);
  return 111;
}

int baz()
{
  printf("... ... ... 4. baz, before qux\n");
  int ret = call_async_function_via_jspi(&qux);
  printf("... ... ... 8. baz, after qux, which returned %d\n", ret);
  return 9000;
}

int bar()
{
  printf("... ... 3. bar, before baz\n");
  int ret = call_sync_function_via_js(&baz);
  printf("... ... 9. bar, after baz, which returned %d\n", ret);
  return 67;
}

int foo()
{
  printf("... 2. foo, before bar\n");
  int ret = call_async_function_via_jspi(&bar);
  printf("... 10. foo, after bar, which returned %d\n", ret);
  return 42;
}

int main()
{
  printf("1. main, before foo\n");
  int ret = call_sync_function_via_js(&foo);
  printf("11. main, after foo, which returned %d\n", ret);

  printf("1 -> 2 -> ... -> 11. ALL DONE!\n");
}

What this program does, is that it calls back and forth between Wasm -> JS -> Wasm -> JS -> Wasm a couple of times, alternating between JSPI asyncified vs traditional synchronous Wasm->JS->Wasm calls.

The program flow expectation then would be, that this program would print out

1. main, before foo
... 2. foo, before bar
... ... 3. bar, before baz
... ... ... 4. baz, before qux
... ... ... ... 5. qux, before quux
... ... ... ... ... 6. quux
... ... ... ... 7. qux, after quux, which returned -15
... ... ... 8. baz, after qux, which returned 111
... ... 9. bar, after baz, which returned 9000
... 10. foo, after bar, which returned 67
11. main, after foo, which returned 42
1 -> 2 -> ... -> 11. ALL DONE!

Due to the way that JSPI currently works, this does not happen, but the above program will throw SuspendError: trying to suspend JS frames after printing 1. and 2., when attempting to suspend for the very first time, when about to call foo -> bar via call_async_function_via_jspi().

The reason for being that there is a JS function in the call stack between the most recent WebAssembly.promising(...)(); call.

Naively, that SuspendError can be avoided by changing the call_sync_function_via_js function definition to

call_sync_function_via_js: function(funcptr) {
  return WebAssembly.promising(getWasmTableEntry(funcptr))();
}

... but then the produced program flow order is not correct, and the program will print the items out of order:

1. main, before foo
... 2. foo, before bar
11. main, after foo, which returned: 0
1 -> 2 -> ... -> 11. ALL DONE!
... ... 3. bar, before baz
... ... ... 4. baz, before qux
... ... 9. bar, after baz, which returned 0
... 10. foo, after bar, which returned 0
... ... ... ... 5. qux, before quux
... ... ... ... ... 6. uux
... ... ... ... 7. qux, after quux, which returned 0
... ... ... 8. baz, after qux, which returned 111

since the JSPI unwind mechanism will only suspend up to the most recent point of WebAssembly.promising().

The above program might seem a bit contrived, but e.g. in Unity3D engine, the above type of control flow can occur in practice. Users do occassionally implement synchronous Wasm->JS->Wasm callbacks (i.e. call_sync_function_via_js() type of calls), which are not always JS event callback originating, and asynchronous JS->Wasm JSPI suspends (call_async_function_via_jspi()) are expected to occur due to various sources, such as pthread creation, filesystem/network read operations, and WebGPU buffer mapping.

We can see situations where we start computing a game frame, user logic calls into JS, then back to Wasm via a callback, and then the game engine needs to suspend Wasm for pthread creation, or the renderer for a WebGPU buffer map. But this suspend would not work, since there are JS frames "sandwiched" in the callstack.

As another example, in the emgc garbage collector, I use sync JS callbacks with a try-finally to ensure that any JS exception thrown inside a GC fence will not be catastrophic, and could not cause a deadlock in the whole multithreaded program. Similar exception guard constructs exist also in the Unity engine.

The question here then is: what kind of mechanisms might exist to address this challenge of suspending and resuming sandwiched Wasm->JS->Wasm->async JS callstacks, when one wants to suspend the whole thing at once?

It seems that currently, one can have a callstack with arbitrary many call_async_function_via_jspi() Wasm->JS->Wasm frames in it, or arbitrary many traditional call_sync_function_via_js() frames in it, but those two can not be mixed?

CC @dschuff @rossberg @tlively @sunfishcode

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions