Improve async

## Intro

We're improving the support for interacting with the GPU in an asynchronous manner. This meta issue outlines the different steps.

## Why

The main purpose is that it allows for increased performance; you can do CPU work while the GPU is doing its thing. But also the GPU can do multiple things at once (e.g. render a new frame while downloading the previous frame).

One very specific use-case it improved performance for offscreen rendering (including the 'bitmap' present mode in rendercanvas). But there are other use-cases, e.g. better interaction with compute, preparing data while the scene is still rendering smooth, etc.

## Effect on API

* Async methods return a promise-like object that supports `p.sync_wait()`, `p.then(..)`, and `await p`.
* We'll probably add a simple way to init (asynchronously) a wgpu application, probably using a decorator.
* Somehow wgpu must be made aware of the rendercanvas `loop` instance. But this may also be a custom (non-rendercanvas) loop. Still figuring out what the API will be.
* Some methods that use `sync_wait` internally will be deprecated, or be made async: `queue.read_buffer()` and `queue.read_texture()`.


## Technical challenges

There are a lot of aspects that play a role in this work: async, event loops (with multiple different backends), threading, the upcoming Pyodide backend, inaccuracy in timers on Windows, etc. 


## Tasks

* [ ] The promise object. Support to schedule stuff with the loop, and also poll directly.
  * [ ] #739 
* [ ] Something like `@wgpu.request_device()`?
* [ ] Getting the loop. Explicit or automatic for rendercanvas?
* [ ] Polling wgpu-native. Via a thread, a task, or something else?
* [ ] Adjusting canvas context (and rendercanvas?) to present bitmaps in an async way.
* [ ] Deprecate or make async the `queue.read_buffer()` and `queue.read_texture()`.
* [ ] More documentation on async and where wgpu applies backpressure.


## Details

### Asyc methods

Methods that are async:

* `gpu.request_adapter_async()` -> To init an application
* `gpu.enumerate_adapters_async()`
* `adapter.request_device_async()` -> To init an application
* `device.get_lost_async()`
* `shadermodule.get_compilation_info_async()`
* `buffer.map_async()`  -> Need for offscreen rendering
* `queue.on_submitted_work_done_async()` -> Maybe convenient for scheduling

Maybe we'll add:
* `queue.read_buffer_async()` -> uses `buffer.map_async()`
* `queue.read_texture_asynx()` -> uses `buffer.map_async()`


### Blocking API calls

There are also a few places where wgpu blocks. 

WebGPU is a queue-based API; the cpu tells the GPU what work to do, literally by submitting it to a queue. Then the GPU does the work at its own pace. If you check the async methods above, none of these (have to) play a part in a normal render loop. So it looks like it should be possible for the CPU to schedule new work faster than the GPU can handle it. But there are some measures in place to prevent this.

* `wgpuSurfacePresent()` can briefly block due to the vsync, and (depending on backend) may also to wait until rendering is ready.
* `wgpuSurfaceGetCurrentTexture()` can block (depending on backend) if the texture is still being used as an attachment. 
  * Usually a surface has 2 or 3 internal textures, so it can have a few in-flight frames.
* `queue.submit()` does not currently block. It just means you create yourself a memory problem if you never wait-poll or await onSubmittedWorkDone.

So when rendering to screen, depending on the backend, either `wgpuSurfacePresent()` or `wgpuSurfaceGetCurrentTexture()` blocks to prevent the CPU from getting too far ahead of the GPU.

In the browser, the mechanic is somewhat different. There is no real swapchain, but the browser hands out a texture (upon `getCurrentTexture()`), which is then used by the compositor to render the result with the rest of the browser content. However, for the compositer to use the texture, it has to wait for the rendering to be done. So if the  rendering takes a long time, it will affect the scheduling of animation frames (i.e. FPS goes down). This is (my theory of) how the throttling works in the browser/WebGPU.

So in WebGPU `GetCurrentTexture` is non-blocking; it will simply give the texture that matches the current animation frame. And there is no present (that happens under the hood).

We can mimic that (in offscreen rendering) by modifying `rendercanvas` to schedule draws based also on the presentation of the frame. This can go as far as waiting for a confirmation for frames presented in a remote client.

Another possibility is that we do block on `getCurrentTexture` or `present` for bitmap presenting with wgpu-native. We *can* use `sync_wait()` as an implementation detail inside `getCurrentTexture`. I think we should just do both.
 
*I'm still wrapping my head around this. Will update as I learn more.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve async #761

Intro

Why

Effect on API

Technical challenges

Tasks

Details

Asyc methods

Blocking API calls

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Improve async #761

Description

Intro

Why

Effect on API

Technical challenges

Tasks

Details

Asyc methods

Blocking API calls

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions