Replies: 1 comment 5 replies
-
Thankfully, this is very simple! They are defined to run in serial in WebGPU. Each dispatch is its own usage scope, so will run as-if serial. We communicate this to metal by making a "serial" compute pass, causing this behavior. See #2659 for discussion of parallel compute passes - something we want just haven't been implemented |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I've got a couple of compute shaders that work on two different buffers and later some another shader merges those two buffers together - I've got a strong suspicion that most of those shaders (being mixed ALU/mem-bound) should be scheduled in parallel, but it looks like Mac only "parallelizes" the first ones:
For completion, that's how it looks like in Xcode's dependencies tab:
Inspecting those shaders tells me that they spend a lot of time waiting for memory (and don't actually use that many registers), making them a perfect candidate for parallelization:
... but it's not happening - is there any way for me to debug why scheduler would do this or the other thing?
(maybe there's some low-hanging fruit to check, some accidentally-created synchronization edge that I could get rid of etc.)
Thanks!
Edit: of course I'm aware that passes being executed in parallel doesn't imply any performance gain (and that it's some magical code in the driver which ultimately decides what's happening) - I'd just like to make sure there's no low-hanging optimization-fruit there.
Beta Was this translation helpful? Give feedback.
All reactions