Shared memory optimizations for Gaussian rasterization#554
Open
matthewdcong wants to merge 2 commits intoopenvdb:mainfrom
Open
Shared memory optimizations for Gaussian rasterization#554matthewdcong wants to merge 2 commits intoopenvdb:mainfrom
matthewdcong wants to merge 2 commits intoopenvdb:mainfrom
Conversation
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
harrism
requested changes
Mar 18, 2026
|
|
||
| // Thread blocks cooperatively cache a tile of Gaussians in shared memory | ||
| const uint32_t sharedMem = getSharedMemRequirements<ScalarType>(tileSize); | ||
| const uint32_t sharedMem = getSharedMemRequirements<ScalarType>(NUM_CHANNELS, tileSize); |
Contributor
There was a problem hiding this comment.
🚩 issue: Shouldn't this be NUM_SHARED_CHANNELS? Also what happens if the number of channels is too large to fit all the features in shared memory?
Contributor
Author
There was a problem hiding this comment.
There's no NUM_SHARED_CHANNELS in the forwards pass (it's just NUM_CHANNELS) because there isn't chunking implemented. So assuming unlimited shared memory, this would be correct as written.
That being said, the lack of chunking is likely why the tests are failing for large feature depths so I'll have to add that.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In profiling, this reduces a 17m 20s single-GPU reconstruction to 16m and 48s, leading to an approximately >3% speedup.