Skip to content

Comments

WebGPU Support#713

Draft
gkjohnson wants to merge 72 commits intomainfrom
webgpu-pathtracer
Draft

WebGPU Support#713
gkjohnson wants to merge 72 commits intomainfrom
webgpu-pathtracer

Conversation

@gkjohnson
Copy link
Owner

@gkjohnson gkjohnson commented Feb 4, 2026

cc @TheBlek

I've branched from #705 and changed things around quite a bit to address the storage buffer limitations by using storage textures and organized kernels into dedicated classes. So canvas resize etc all works. I have also separated the "MegaKernal" from the "PathTracerCore" so it's easier it will be easier to follow the differences and dependencies between the implementations.

Next I'm going to look into some of the ideas around a ray queue we'd discussed previously. Then we can try some timing to see how things pan out.

image

Relatedly, this write up will be interesting for a wave front path tracer:

https://developer.blender.org/docs/features/cycles/kernel_scheduling/

TODO

  • Fix creating new kernels every loop, causing GC issues
  • Improve flashing (ensure at least one full pass is complete)
  • Adjust queue sizes based on needs
  • Add "PathTracerBackend"
Plans
  • Add variance detection
  • Add "completion" detection
  • Add scene bvh + geometry utility
  • Design WebGPUPathtracer API
  • Add exports to package.json
  • Add "debug" views (sample count, completion visualization, etc)

TheBlek and others added 30 commits October 16, 2025 20:07
# Conflicts:
#	package-lock.json
#	package.json
@TheBlek TheBlek mentioned this pull request Feb 5, 2026
5 tasks
@gkjohnson
Copy link
Owner Author

@TheBlek - I'm going to call this "done" as a first pass, for now. There are some workarounds of three.js issues which are marked in TODOs but it's working fairly well. One of the features I'm liking the most about it is how scalable it is - we can reduce the amount of rays processed per frame based on framerate and the page can remain responsive since the whole 7+ bounce path doesn't need to finish in a single pass. Curious to hear your thoughts.

The overall approach works like so:

  1. Iterate over all pixels in a tiled format and push rays to trace onto a ring buffer work queue. We only iterate over the tile if there's enough space in the queue to add rays for all pixels in the tile (even though in practice we may be skipping some). Rays that have been added to the queue have their pixels marked as "active" to avoid multiple rays for the same pixel to the queue. We also issue a compute call for every tile but use indirect dispatch buffers to "cancel" unneeded generation when the queue has become full.

  2. Trace rays in the work queue against the BVH. If there is no hit then accumulate the color in the final target buffer, increment the sample count, and mark the pixel as "inactive". If it does hit then add it to the "hitQueue". Then increment the ray queue ring buffer head pointer forward .

  3. Process the hits. If we have reached the maximum bounce count then terminate the ray, mark the pixel as inactive, and increment the sample. Otherwise add a scatter ray back to the ray queue. Then go back to step 1 to "top up" the queue with any inactive pixels and start again.

--

A few things that need to be considered or added to aid with performance at some point:

  • Add support for a maximum sample count to prevent adding and working on rays for pixels that will have "finished" more quickly.

  • We'll want some method for detecting that at a minimum X samples across the image have finished so that we can determine when it's ready to show and avoid the partially-finished rendering. Probably with a simple compute buffer that checks all pixels and writes a storage buffer we can read back if a pixel has not passed the threshold.

  • Adding some kind of "convergence detection" using a minimum sample count and tracking variance of the samples. This will let pixels be marked as "completed" early on if it converges early (diffuse surfaces, unlit surfaces, background, etc) so we can skip rays for these cases and focus on pixels that need more rays and samples to converge.

  • Related to the above point: we'll eventually get to a point where we only have a few hundred pixels or less left to process at which point it would be best to dispatch multiple rays per pixel and we'll need to handle the race condition of rays writing to the same pixel. This will probably involve adding a special kernel that can help resolve multiple rays writing to the same pixel.

--

I'll wait to see where you're going before putting too much more work into this path tracing logic specifically. I may look at some of the other points I mentioned in #705 (comment) when I have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants