-
Notifications
You must be signed in to change notification settings - Fork 0
Confidence implemented, with tests and confidence submodule #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
horizon-blue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking the lead here! :D The math looks good to me, and I left some inline comments on some minor issues. I also feel like we might want to move more things to the .cu files (not the ones in the bindings/ directory), and keep the headers themselves minimal. Though it's been a while since the last time I worked on a large cpp project, so I'd let @mugamma make the final call. We can also chat a bit about this tmr. Thanks again for the great (and speedy) work!
| using ZeroParameterConfidence = ThreeParameterConfidence; | ||
|
|
||
| // Generic CUDA kernel for computing confidence values | ||
| template <typename Confidence> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mugamma might know more about this but my instinct is that we probably shouldn't have kernel definition like these in the header files. Actually, whenever possible, it'd be a good idea to move the implementation of the structs to the corresponding .cu file as well. We can chat about file organization together in our meeting tmr as well :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, i know very little about the right way to organize here. Will implement whatever suggestions you both have :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the kernel over to a .cu file in core (from the header) in 167be1e
For now I am keeping the full struct definitions in the .cuh file, i'm concerned that nanobind can't pick it up without the full definition in the header. But happy to tale a suggestion here!
Co-authored-by: Xiaoyan Wang <[email protected]>
|
Implemented all the changes that fixes MET-44 @horizon-blue @mugamma. They are the following:
|
mugamma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to keep you from merging your PR, so I've approved, but I think there is still too much in here.
I actually don't think the kernel launch tests for the confidence functions should test anything other than the fact that we can call these functions inside a kernel. Anything else those tests are measuring can be removed.
The correctness tests in C++ and python are also redundant. We can just keep the python ones against a ground-truth that we didn't implement, preferably the methods in the original FMB code.
Overall, a clean set of tests would be:
- C++ smoke tests for calling the functions inside kernels.
- Python tests for correctness of utilities (e.g. sigmoid) against reference scipy implementations.
- Python tests for correctness of confidence methods against reference implementations in the FMB library.
|
|
||
| #include <cmath> | ||
| #include <cuda_runtime.h> | ||
| #include <vector> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need vector?
| void cuda_check(cudaError_t code, const char* file, int line); | ||
|
|
||
| CUDA_CALLABLE __forceinline__ float sigmoid(float x) { | ||
| if (isnan(x)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check necessary? wouldn't the other branch already be a nan if the input is nan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the "python" references in the comments here just mean "ground truth" right? No python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this is necessary. This is essentially a smoke test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't know what I feel about the testing pattern in this file. Essentially we are replicating the implementation and testing the implementation in the codebase against re-implementation here. In a sense the implementation here is the gold standard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually gonna ask you about this too lol, but had to go for my meeting earlier
|
@mugamma agreed with everything. It felt weird to do a correctness test in C++ by literally reimplementing the same thing for CPU. The smoke test thing is a good idea. But for FMB, the codebase is shit. none of these pieces are broken up into functions. What do we do then? Should I write a GT function (like I already did in python) and add a permalink to where I think the FMB author had this? I'll make all these changes before merging, don't worry. Good catches. |
We should totally only merge if we all like the structure. If not, it'll set a bad precedent for later. Doing all the corrections in this branch so that it is easier for later. |
horizon-blue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me overall minus the stuffs that Matin has pointed out. Thanks for the update!
I personally don't mind if we are having more tests than necessary as long as they won't slow down the overall test suite by a lot, and it seems like they aren't, so it's fine to me if you want to keep them there.
Feel free to merge the PR when you feel ready to keep us moving forward :). We can always address the minor concerns in follow-up PRs.
| CUDA_CALLABLE __forceinline__ float get_confidence(float sumexpd) const { | ||
| return 1.0f - expf(-sumexpd); | ||
| } | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's generally a good practice to end text files with a new line. Otherwise, future diffs will show not just the new code, but also and a "change" to the previous last line (which is why GitHub shows a warning sign here). How do you feel about adding a InsertNewlineAtEOF: true to our current .clang-format to get this covered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this EOF idea for clang-format. Should we make a PR for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, you already added it in 1424f94
nice!
| #include <cstdint> | ||
| #include <nanobind/nanobind.h> | ||
| #include <nanobind/stl/vector.h> | ||
| #include <stdexcept> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this being used by anything? 👀
| @@ -1,12 +1,13 @@ | |||
| #include <cstdint> | |||
| #include <cuda_runtime.h> | |||
| #include <vector> | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch
(Completes MET-29)
My apologies for this being a relatively big PR (not as bite-sized as I would have liked). It is my first attempt at implementing one of these components, and it took me several hours to understand how everything could pieced together. It has:
genmetaballs/src/cuda/bindingswhere each file can be the bindings for a specific submodule andmain.cuis the main binding file (was previouslygenmetaballs/src/cuda/bindings.cu.genmetaballs/src/cuda/core/confidence.cuh(0,3 and 5 params) along with a templated CUDA kernel launch at computing this confidence score for an array ofsumexpd_vec, which is implemented asgpu_get_confidence. I don't expect that we will need to use something likegpu_get_confidencein the forward render, but this was just to test that it runs in CUDA.confidencesubmodule binding ingenmetaballs/src/cuda/bindings/confidence.cuwhich has:3.1. Has an
init_confidence_submodulebinding using theNB_MODULEmacro, that registers the difference Confidence structs and also registersgpu_get_confidence.3.2. A type dispatcher called
dispatch_confidencewhich does the type dispatch in C++ calls (like ingpu_get_confidence).tests/python_tests/test_confidence.pywhich also checks that the confidence values are always between 0 and 1.coresubmodule to the pythongenmetaballspackage which just calls onto the submodule bindingsgenmetaballs/src/cuda/core/math_utils.cuh, a new file for math utils.