Shared memory optimizations for Gaussian rasterization by matthewdcong · Pull Request #554 · openvdb/fvdb-core

matthewdcong · 2026-03-18T06:08:55Z

Forward rasterization does not currently store features in shared memory. As the problem size becomes larger (more intersections with each Gaussian), the cost of an unconditional global load is outweighed by the shared memory reuse.
In addition, we cull loads for Gaussians with an opacity less than the threshold necessary for a Gaussian to be valid in the volume rendering pass. This optimization applies to the forward and backwards pass.

In profiling, this reduces a 17m 20s single-GPU reconstruction to 16m and 48s, leading to an approximately >3% speedup.

Signed-off-by: Matthew Cong <mcong@nvidia.com>

harrism

One concern.

harrism · 2026-03-18T20:42:04Z

src/fvdb/detail/ops/gsplat/GaussianRasterizeForward.cu


            // Thread blocks cooperatively cache a tile of Gaussians in shared memory
-            const uint32_t sharedMem = getSharedMemRequirements<ScalarType>(tileSize);
+            const uint32_t sharedMem = getSharedMemRequirements<ScalarType>(NUM_CHANNELS, tileSize);


🚩 issue: ‏Shouldn't this be NUM_SHARED_CHANNELS? Also what happens if the number of channels is too large to fit all the features in shared memory?

There's no NUM_SHARED_CHANNELS in the forwards pass (it's just NUM_CHANNELS) because there isn't chunking implemented. So assuming unlimited shared memory, this would be correct as written.

That being said, the lack of chunking is likely why the tests are failing for large feature depths so I'll have to add that.

matthewdcong added 2 commits March 17, 2026 23:03

Store features in shared memory for forward pass

e949653

Signed-off-by: Matthew Cong <mcong@nvidia.com>

Skip Gaussian load in backwards pass if opacity is below threshold

9478356

Signed-off-by: Matthew Cong <mcong@nvidia.com>

matthewdcong requested a review from a team as a code owner March 18, 2026 06:08

matthewdcong requested review from harrism and phapalova March 18, 2026 06:08

harrism requested changes Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared memory optimizations for Gaussian rasterization#554

Shared memory optimizations for Gaussian rasterization#554
matthewdcong wants to merge 2 commits intoopenvdb:mainfrom
matthewdcong:smem_features_forward_pass

matthewdcong commented Mar 18, 2026

Uh oh!

harrism left a comment

Uh oh!

harrism Mar 18, 2026

Uh oh!

matthewdcong Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

matthewdcong commented Mar 18, 2026

Uh oh!

harrism left a comment

Choose a reason for hiding this comment

Uh oh!

harrism Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

matthewdcong Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants