Skip to content
3 changes: 3 additions & 0 deletions lucene/CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -330,6 +330,9 @@ Optimizations

* GITHUB#15585: Added shared prefetch counter to maintain count across clones and slices of MemorySegmentIndexInput. (Shubham Sharma)

* GITHUB#15607: Utilize bulk scoring for diversity checking when building HNSW vector indices. This results
in some performance improvements during indexing and segment merges. (Ben Trent)

Bug Fixes
---------------------
* GITHUB#14161: PointInSetQuery's constructor now throws IllegalArgumentException
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,13 @@ public class HnswGraphBuilder implements HnswBuilder {
@SuppressWarnings("NonFinalStaticField")
public static long randSeed = DEFAULT_RAND_SEED;

private static final int MAX_BULK_SCORE_NODES = 8;

protected final int M; // max number of connections on upper layers
private final double ml;

private final int[] bulkScoreNodes; // for bulk scoring
private final float[] bulkScores; // for bulk scoring
Comment on lines +72 to +73
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried about thread-safety of these arrays (given we have concurrent merging), but from this comment it looks like instances of this class are not shared across threads, but rather multiple instances of this class (across different threads) can operate on a single HnswGraph?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaivalnp the scorer itself isn't threadsafe. I assumed that since we were using a scorer, we were OK.

I had the same threading concerns and looked up and it seems that each thread as a unique builder object instance (The worker of the thread) and they all work on the same graph.

private final SplittableRandom random;
protected final UpdateableRandomVectorScorer scorer;
protected final HnswGraphSearcher graphSearcher;
Expand Down Expand Up @@ -156,6 +160,10 @@ protected HnswGraphBuilder(
this.hnsw = hnsw;
this.hnswLock = hnswLock;
this.graphSearcher = graphSearcher;
// pick a number that keeps us from scoring TOO much for diversity checking
// but enough to take advantage of bulk scoring
this.bulkScoreNodes = new int[MAX_BULK_SCORE_NODES];
this.bulkScores = new float[MAX_BULK_SCORE_NODES];
entryCandidates = new GraphBuilderKnnCollector(1);
beamCandidates = new GraphBuilderKnnCollector(beamWidth);
beamCandidates0 = new GraphBuilderKnnCollector(Math.min(beamWidth / 2, M * 3));
Expand Down Expand Up @@ -470,9 +478,11 @@ static void popToScratch(GraphBuilderKnnCollector candidates, NeighborArray scra
*/
private boolean diversityCheck(float score, NeighborArray neighbors, RandomVectorScorer scorer)
throws IOException {
for (int i = 0; i < neighbors.size(); i++) {
float neighborSimilarity = scorer.score(neighbors.nodes()[i]);
if (neighborSimilarity >= score) {
final int bulkScoreChunk = Math.min((neighbors.size() + 1) / 2, bulkScoreNodes.length);
for (int scored = 0; scored < neighbors.size(); scored += bulkScoreChunk) {
int chunkSize = Math.min(bulkScoreChunk, neighbors.size() - scored);
System.arraycopy(neighbors.nodes(), scored, bulkScoreNodes, 0, chunkSize);
if (scorer.bulkScore(bulkScoreNodes, bulkScores, chunkSize) >= score) {
return false;
}
}
Expand Down