Skip to content

Redundant computation for exact nearest neighbors #391

@huynmg

Description

@huynmg

When running the knnPerfTest multiple times with different quantization levels, I noticed that the true nearest neighbors are recomputed and cached to different files each time, which is not necessary. This is because the indexPath is used to calculate the hash key for caching the true nearest neighbors.

I think this is redundant because the true nearest neighbors should be index-agnostic. Should we remove indexPath as a hash parameter ?

private int[][] getExactNN(Path docPath, Path indexPath, Path queryPath, int queryStartIndex) throws IOException, InterruptedException {
// look in working directory for cached nn file
String hash = Integer.toString(Objects.hash(docPath, indexPath, queryPath, numDocs, numQueryVectors, topK, similarityFunction.ordinal(), parentJoin, queryStartIndex, prefilter ? selectivity : 1f, prefilter ? randomSeed : 0f), 36);
String nnFileName = "nn-" + hash + ".bin";
Path nnPath = Paths.get(nnFileName);

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions