Skip to content

Commit a758181

Browse files
authored
Refactor shared memory types in IntegrateTSDF kernel
Host code computes the dynamic shared-memory size with `nanovdb::math::Mat3<scalar_t>` and `Mat4<scalar_t>, but inside the kernel those matrices are instantiated with `ScalarType = OpType<scalar_t>::type`. For the c10::Half dispatch this means the kernel stores Mat3<float>/Mat4<float> (36/64 bytes each) while the launch only reserves enough space for Mat3<half>/Mat4<half> (18/32 bytes). On Blackwell that mismatch shows up as the out-of-bounds shared write. Signed-off-by: Jonathan Swartz <jonathan@jswartz.info> Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
1 parent 785bb29 commit a758181

File tree

1 file changed

+6
-4
lines changed

1 file changed

+6
-4
lines changed

src/fvdb/detail/ops/IntegrateTSDF.cu

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -447,14 +447,16 @@ doIntegrate(const float truncationMargin,
447447
tsdf.scalar_type(),
448448
"integrateTSDFKernel",
449449
AT_WRAP([&]() {
450-
using Mat3T = nanovdb::math::Mat3<scalar_t>;
451-
using Mat4T = nanovdb::math::Mat4<scalar_t>;
450+
using shared_scalar_t = typename OpType<scalar_t>::type;
451+
using SharedMat3T = nanovdb::math::Mat3<shared_scalar_t>;
452+
using SharedMat4T = nanovdb::math::Mat4<shared_scalar_t>;
452453
constexpr uint64_t VOXELS_PER_LEAF = nanovdb::OnIndexTree::LeafNodeType::NUM_VALUES;
453454
const auto numUnionLeaves = unionGrid.totalLeaves();
454-
const auto numSharedScalars = 2 * batchSize * 3 * 3 + batchSize * 4 * 4;
455+
const auto numSharedScalars = 2 * batchSize * 3 * 3 + 2 * batchSize * 4 * 4;
455456
const auto problemSize =
456457
std::max(numUnionLeaves * VOXELS_PER_LEAF, uint64_t(numSharedScalars));
457-
const auto sharedMemSize = 2 * batchSize * sizeof(Mat3T) + batchSize * sizeof(Mat4T);
458+
const auto sharedMemSize =
459+
2 * batchSize * sizeof(SharedMat3T) + 2 * batchSize * sizeof(SharedMat4T);
458460
const auto numBlocks = GET_BLOCKS(problemSize, DEFAULT_BLOCK_DIM);
459461

460462
const auto dtype = tsdf.scalar_type();

0 commit comments

Comments
 (0)