Skip to content

Conversation

@arijit-dasgupta
Copy link
Contributor

(Completes MET-29)

My apologies for this being a relatively big PR (not as bite-sized as I would have liked). It is my first attempt at implementing one of these components, and it took me several hours to understand how everything could pieced together. It has:

  1. Rearranged the bindings into a separate folder genmetaballs/src/cuda/bindings where each file can be the bindings for a specific submodule and main.cu is the main binding file (was previously genmetaballs/src/cuda/bindings.cu.
  2. Implementation of the different Confidence structs in genmetaballs/src/cuda/core/confidence.cuh (0,3 and 5 params) along with a templated CUDA kernel launch at computing this confidence score for an array of sumexpd_vec, which is implemented as gpu_get_confidence. I don't expect that we will need to use something like gpu_get_confidence in the forward render, but this was just to test that it runs in CUDA.
  3. Implemented a confidence submodule binding in genmetaballs/src/cuda/bindings/confidence.cu which has:
    3.1. Has an init_confidence_submodule binding using the NB_MODULE macro, that registers the difference Confidence structs and also registers gpu_get_confidence.
    3.2. A type dispatcher called dispatch_confidence which does the type dispatch in C++ calls (like in gpu_get_confidence).
  4. Testing for the confidence implementation in tests/python_tests/test_confidence.py which also checks that the confidence values are always between 0 and 1.
  5. Adding a core submodule to the python genmetaballs package which just calls onto the submodule bindings
  6. Sigmoid C++ code in genmetaballs/src/cuda/core/math_utils.cuh, a new file for math utils.

@linear
Copy link

linear bot commented Nov 18, 2025

@arijit-dasgupta arijit-dasgupta marked this pull request as ready for review November 18, 2025 06:24
Copy link
Contributor

@horizon-blue horizon-blue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking the lead here! :D The math looks good to me, and I left some inline comments on some minor issues. I also feel like we might want to move more things to the .cu files (not the ones in the bindings/ directory), and keep the headers themselves minimal. Though it's been a while since the last time I worked on a large cpp project, so I'd let @mugamma make the final call. We can also chat a bit about this tmr. Thanks again for the great (and speedy) work!

using ZeroParameterConfidence = ThreeParameterConfidence;

// Generic CUDA kernel for computing confidence values
template <typename Confidence>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mugamma might know more about this but my instinct is that we probably shouldn't have kernel definition like these in the header files. Actually, whenever possible, it'd be a good idea to move the implementation of the structs to the corresponding .cu file as well. We can chat about file organization together in our meeting tmr as well :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, i know very little about the right way to organize here. Will implement whatever suggestions you both have :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the kernel over to a .cu file in core (from the header) in 167be1e

For now I am keeping the full struct definitions in the .cuh file, i'm concerned that nanobind can't pick it up without the full definition in the header. But happy to tale a suggestion here!

@arijit-dasgupta
Copy link
Contributor Author

Implemented all the changes that fixes MET-44 @horizon-blue @mugamma. They are the following:

  • Single bindings file
  • Single utils file
  • Implement GPU tests for sigmoid (vectorized)
  • Implement GPU tests for confidence
  • Use the CUDA_CALLABLE macro
  • Chill down on the sigmoid tests
  • Rename confidence classes (reflect the actl num params) + remove alias as a result
  • Remove dispatch for C++ confidence python
  • remove binding for get_gpu_confidence
  • remove binding for sigmoid vector

Copy link
Contributor

@mugamma mugamma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to keep you from merging your PR, so I've approved, but I think there is still too much in here.

I actually don't think the kernel launch tests for the confidence functions should test anything other than the fact that we can call these functions inside a kernel. Anything else those tests are measuring can be removed.

The correctness tests in C++ and python are also redundant. We can just keep the python ones against a ground-truth that we didn't implement, preferably the methods in the original FMB code.

Overall, a clean set of tests would be:

  1. C++ smoke tests for calling the functions inside kernels.
  2. Python tests for correctness of utilities (e.g. sigmoid) against reference scipy implementations.
  3. Python tests for correctness of confidence methods against reference implementations in the FMB library.


#include <cmath>
#include <cuda_runtime.h>
#include <vector>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need vector?

void cuda_check(cudaError_t code, const char* file, int line);

CUDA_CALLABLE __forceinline__ float sigmoid(float x) {
if (isnan(x)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check necessary? wouldn't the other branch already be a nan if the input is nan?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the "python" references in the comments here just mean "ground truth" right? No python.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is necessary. This is essentially a smoke test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't know what I feel about the testing pattern in this file. Essentially we are replicating the implementation and testing the implementation in the codebase against re-implementation here. In a sense the implementation here is the gold standard?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually gonna ask you about this too lol, but had to go for my meeting earlier

@arijit-dasgupta
Copy link
Contributor Author

arijit-dasgupta commented Nov 19, 2025

@mugamma agreed with everything. It felt weird to do a correctness test in C++ by literally reimplementing the same thing for CPU. The smoke test thing is a good idea.

But for FMB, the codebase is shit. none of these pieces are broken up into functions. What do we do then? Should I write a GT function (like I already did in python) and add a permalink to where I think the FMB author had this?

I'll make all these changes before merging, don't worry. Good catches.

@arijit-dasgupta
Copy link
Contributor Author

@mugamma agreed with everything. It felt weird to do a correctness test in C++ by literally reimplementing the same thing for CPU. The smoke test thing is a good idea.

But for FMB, the codebase is shit. none of these pieces are broken up into functions. What do we do then? Should I write a GT function (like I already did in python) and add a permalink to where I think the FMB author had this?

I'll make all these changes before merging, don't worry. Good catches.

We should totally only merge if we all like the structure. If not, it'll set a bad precedent for later. Doing all the corrections in this branch so that it is easier for later.

Copy link
Contributor

@horizon-blue horizon-blue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me overall minus the stuffs that Matin has pointed out. Thanks for the update!

I personally don't mind if we are having more tests than necessary as long as they won't slow down the overall test suite by a lot, and it seems like they aren't, so it's fine to me if you want to keep them there.

Feel free to merge the PR when you feel ready to keep us moving forward :). We can always address the minor concerns in follow-up PRs.

CUDA_CALLABLE __forceinline__ float get_confidence(float sumexpd) const {
return 1.0f - expf(-sumexpd);
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generally a good practice to end text files with a new line. Otherwise, future diffs will show not just the new code, but also and a "change" to the previous last line (which is why GitHub shows a warning sign here). How do you feel about adding a InsertNewlineAtEOF: true to our current .clang-format to get this covered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this EOF idea for clang-format. Should we make a PR for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, you already added it in 1424f94

nice!

#include <cstdint>
#include <nanobind/nanobind.h>
#include <nanobind/stl/vector.h>
#include <stdexcept>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this being used by anything? 👀

@@ -1,12 +1,13 @@
#include <cstdint>
#include <cuda_runtime.h>
#include <vector>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

@mugamma mugamma merged commit de06966 into master Nov 20, 2025
1 check passed
@mugamma mugamma deleted the arijit/implement-confidence branch November 20, 2025 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants