Skip to content

Potential Bug: M_ might be used incorrectly instead of Mcurmax in neighbor selection heuristic #635

@MarsMan13

Description

@MarsMan13

Hello hnswlib maintainers,

First, thank you for creating and maintaining this excellent library.

I'm opening this issue to report a potential bug I noticed while reviewing the source code related to graph construction. It appears that M_ (the maximum number of connections for upper layers) might be used where Mcurmax (the maximum number of connections for the current layer, especially layer 0) should be used.

Location:

In the file hnswlib/hnswalg.h (Please confirm the exact file and line number), inside one of the main graph construction functions like mutuallyConnectNewElement or a similar heuristic function, I found the following call:

getNeighborsByHeuristic2(top_candidates, M_);

Problem Analysis:

As I understand it:

M_ is the parameter for the maximum number of neighbors on layers > 0.

For the base layer (layer 0), a different, typically larger, maximum maxM0_ (which is assigned to Mcurmax when level == 0) is used to create a denser graph.

The function getNeighborsByHeuristic2 is a heuristic for selecting the best neighbors to connect to. If this function is called when processing connections for layer 0, passing M_ as the parameter would incorrectly limit the number of selected neighbors to the upper-layer maximum. This would result in fewer connections than intended at the most critical base layer.

Suggested Fix:

I believe the call, especially when handling the base layer, should use the maximum connection limit specific to that layer. For example:

getNeighborsByHeuristic2(top_candidates, Mcurmax);

Potential Impact:

If this observation is correct, the graph construction at layer 0 might be suboptimal, creating a less connected graph than configured by the user. This could potentially impact the final recall and performance of the index.

I am still getting familiar with the intricacies of the algorithm, so I could be mistaken. However, it seemed like a significant detail worth pointing out for your review.

Could you please take a look and confirm if this is the intended behavior?

Thank you for your time and consideration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions