-
Notifications
You must be signed in to change notification settings - Fork 150
Fix bruteforce search imprecision #1635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix bruteforce search imprecision #1635
Conversation
|
Looks like for the example in #1632 doing this fixes the issue. Results in 0.99 recall with correct distances. However, I have concerns (like you mentioned in the meeting) about self-distances not getting 0 if we add this fix. For example, running this script with this fix prints we can see the self-distance for vector 2 is 0.0625 😔 |
|
cuML does a second pass to recompute the exact distances, and reorder the set nearest neighbors. We could do the same in the Python layer of cuVS, but it wont fix the C++ calls. Ideally the fix should be implemented in C++. |
|
I updated the PR to move away from updating the mechanism clamping the self-neighbor, and instead implemented a refining step on the search results of bruteforce. Is this something we would like to keep? Or should we look for an other solution? We could instead implement the refining in the cuVS Python layer and after C++ calls in consumers of the cuVS bruteforce algorithm (like UMAP). |
|
I would prefer that we not slow this algorithm down further by having to run multiple passes if it can at all be avoided. I'd much rather expend some additonal time thinking through a more performant and robust method to do this. We have really been suffering on the perf side lately by making changes that, when done in isolation don't affect perf by all that much, but when taken all together have had a huge impact... |
|
I removed the refinement from the C++ side and replaced it with an optional second pass available in the Python layer through a |
Closes #1632