-
-
Notifications
You must be signed in to change notification settings - Fork 30
Generalize SIMD distance implementation to n-length vectors #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Is the regression for building or for querying? We should ideally benchmark both. I think we'd be willing to take a small regression on building but not regressing querying is actually more important. Also would be good to take a quick look at the impact on in-memory size. |
I've just run some benchmarks for querying. There's a lot of variance in the results, I think it's coming from Python's overhead since I'm running For original the shortest loop took 3.227s, after updating the metric it grew to 5.059s and after using |
Right, those are definitely some substantial latency regressions -- so I don't think we want to go this route. (Probably not necessary right now, but it might be interesting to test the low-level metric stuff with Rust benchmarks instead of the Python bindings? Would get rid of the Python noise at least.) |
8b307d2
to
afad778
Compare
8c7374c
to
e4ce915
Compare
Should keep this issue in mind: |
I think I'd prefer to continue with this one and close/rework #35 once this PR is merged. |
Why is that? Do you think you'll still have time to work on this? Hope the new job is working out well for you! |
The new job is great. I'm still hoping to finish this but the last few months are proving me wrong. |
Ambient looks cool, BTW. How about we just say that we're going to pick it up, and if you get to it before us that's nice, but we'll both assume that you have enough other things on your plate as it is? 🙂 |
Addresses #20
Changes
instant-distance-py
crateMetric
traitPointStorage
to prevent extra dereference when getting to dataBenchmark
I used building HNSW for the first 100k fasttext words as a benchmark (benchmarking script for reference).
The original implementation takes around 18.7s (min out of 20 runs). After changes it takes 19.5s (again min out of 20 runs). Roughly 4.2% slower, I tried profiling it but I'm not sure where it comes from.
As for the querying (10k queries per benchmark): original takes 4.8s and the new version 4.5s. So it actually looks like it's slightly faster.
Benchmarking directly in rust (added in 1st commit of this PR)
Original (almost, with benchmarking added):
After changes:
This time building seems to be 16.8% slower (less visible when using through Python). Querying is comparable.