Commit a758181
authored
Refactor shared memory types in IntegrateTSDF kernel
Host code computes the dynamic shared-memory size with `nanovdb::math::Mat3<scalar_t>` and `Mat4<scalar_t>, but inside the kernel those matrices are instantiated with `ScalarType = OpType<scalar_t>::type`. For the c10::Half dispatch this means the kernel stores Mat3<float>/Mat4<float> (36/64 bytes each) while the launch only reserves enough space for Mat3<half>/Mat4<half> (18/32 bytes). On Blackwell that mismatch shows up as the out-of-bounds shared write.
Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>1 parent 785bb29 commit a758181
1 file changed
+6
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
447 | 447 | | |
448 | 448 | | |
449 | 449 | | |
450 | | - | |
451 | | - | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
452 | 453 | | |
453 | 454 | | |
454 | | - | |
| 455 | + | |
455 | 456 | | |
456 | 457 | | |
457 | | - | |
| 458 | + | |
| 459 | + | |
458 | 460 | | |
459 | 461 | | |
460 | 462 | | |
| |||
0 commit comments