-
Notifications
You must be signed in to change notification settings - Fork 751
Description
Hello hnswlib maintainers,
First, thank you for creating and maintaining this excellent library.
I'm opening this issue to report a potential bug I noticed while reviewing the source code related to graph construction. It appears that M_ (the maximum number of connections for upper layers) might be used where Mcurmax (the maximum number of connections for the current layer, especially layer 0) should be used.
Location:
In the file hnswlib/hnswalg.h (Please confirm the exact file and line number), inside one of the main graph construction functions like mutuallyConnectNewElement or a similar heuristic function, I found the following call:
getNeighborsByHeuristic2(top_candidates, M_);
Problem Analysis:
As I understand it:
M_ is the parameter for the maximum number of neighbors on layers > 0.
For the base layer (layer 0), a different, typically larger, maximum maxM0_ (which is assigned to Mcurmax when level == 0) is used to create a denser graph.
The function getNeighborsByHeuristic2 is a heuristic for selecting the best neighbors to connect to. If this function is called when processing connections for layer 0, passing M_ as the parameter would incorrectly limit the number of selected neighbors to the upper-layer maximum. This would result in fewer connections than intended at the most critical base layer.
Suggested Fix:
I believe the call, especially when handling the base layer, should use the maximum connection limit specific to that layer. For example:
getNeighborsByHeuristic2(top_candidates, Mcurmax);
Potential Impact:
If this observation is correct, the graph construction at layer 0 might be suboptimal, creating a less connected graph than configured by the user. This could potentially impact the final recall and performance of the index.
I am still getting familiar with the intricacies of the algorithm, so I could be mistaken. However, it seemed like a significant detail worth pointing out for your review.
Could you please take a look and confirm if this is the intended behavior?
Thank you for your time and consideration.