-
-
Notifications
You must be signed in to change notification settings - Fork 50
Description
Intro
We're improving the support for interacting with the GPU in an asynchronous manner. This meta issue outlines the different steps.
Why
The main purpose is that it allows for increased performance; you can do CPU work while the GPU is doing its thing. But also the GPU can do multiple things at once (e.g. render a new frame while downloading the previous frame).
One very specific use-case it improved performance for offscreen rendering (including the 'bitmap' present mode in rendercanvas). But there are other use-cases, e.g. better interaction with compute, preparing data while the scene is still rendering smooth, etc.
Effect on API
- Async methods return a promise-like object that supports
p.sync_wait(),p.then(..), andawait p. - We'll probably add a simple way to init (asynchronously) a wgpu application, probably using a decorator.
- Somehow wgpu must be made aware of the rendercanvas
loopinstance. But this may also be a custom (non-rendercanvas) loop. Still figuring out what the API will be. - Some methods that use
sync_waitinternally will be deprecated, or be made async:queue.read_buffer()andqueue.read_texture().
Technical challenges
There are a lot of aspects that play a role in this work: async, event loops (with multiple different backends), threading, the upcoming Pyodide backend, inaccuracy in timers on Windows, etc.
Tasks
- The promise object. Support to schedule stuff with the loop, and also poll directly.
- Something like
@wgpu.request_device()? - Getting the loop. Explicit or automatic for rendercanvas?
- Polling wgpu-native. Via a thread, a task, or something else?
- Adjusting canvas context (and rendercanvas?) to present bitmaps in an async way.
- Deprecate or make async the
queue.read_buffer()andqueue.read_texture(). - More documentation on async and where wgpu applies backpressure.
Details
Asyc methods
Methods that are async:
gpu.request_adapter_async()-> To init an applicationgpu.enumerate_adapters_async()adapter.request_device_async()-> To init an applicationdevice.get_lost_async()shadermodule.get_compilation_info_async()buffer.map_async()-> Need for offscreen renderingqueue.on_submitted_work_done_async()-> Maybe convenient for scheduling
Maybe we'll add:
queue.read_buffer_async()-> usesbuffer.map_async()queue.read_texture_asynx()-> usesbuffer.map_async()
Blocking API calls
There are also a few places where wgpu blocks.
WebGPU is a queue-based API; the cpu tells the GPU what work to do, literally by submitting it to a queue. Then the GPU does the work at its own pace. If you check the async methods above, none of these (have to) play a part in a normal render loop. So it looks like it should be possible for the CPU to schedule new work faster than the GPU can handle it. But there are some measures in place to prevent this.
wgpuSurfacePresent()can briefly block due to the vsync, and (depending on backend) may also to wait until rendering is ready.wgpuSurfaceGetCurrentTexture()can block (depending on backend) if the texture is still being used as an attachment.- Usually a surface has 2 or 3 internal textures, so it can have a few in-flight frames.
queue.submit()does not currently block. It just means you create yourself a memory problem if you never wait-poll or await onSubmittedWorkDone.
So when rendering to screen, depending on the backend, either wgpuSurfacePresent() or wgpuSurfaceGetCurrentTexture() blocks to prevent the CPU from getting too far ahead of the GPU.
In the browser, the mechanic is somewhat different. There is no real swapchain, but the browser hands out a texture (upon getCurrentTexture()), which is then used by the compositor to render the result with the rest of the browser content. However, for the compositer to use the texture, it has to wait for the rendering to be done. So if the rendering takes a long time, it will affect the scheduling of animation frames (i.e. FPS goes down). This is (my theory of) how the throttling works in the browser/WebGPU.
So in WebGPU GetCurrentTexture is non-blocking; it will simply give the texture that matches the current animation frame. And there is no present (that happens under the hood).
We can mimic that (in offscreen rendering) by modifying rendercanvas to schedule draws based also on the presentation of the frame. This can go as far as waiting for a confirmation for frames presented in a remote client.
Another possibility is that we do block on getCurrentTexture or present for bitmap presenting with wgpu-native. We can use sync_wait() as an implementation detail inside getCurrentTexture. I think we should just do both.
I'm still wrapping my head around this. Will update as I learn more.