Bulk score hnsw diversity check#15607
Conversation
| float neighborSimilarity = scorer.score(neighbors.nodes()[i]); | ||
| if (neighborSimilarity >= score) { | ||
| return false; | ||
| bulkScoreNodes[bulkCount++] = neighbors.nodes()[i]; |
There was a problem hiding this comment.
Thinking about this more, doing 8 always, even if there are only 8 connections seems foolish. I will benchmark with Math.min((neighbors.nodes().length + 1)/2, bulkScoreNodes.length)).
kaivalnp
left a comment
There was a problem hiding this comment.
Looks like a nice optimization!
| private boolean diversityCheck(float score, NeighborArray neighbors, RandomVectorScorer scorer) | ||
| throws IOException { | ||
| int bulkCount = 0; | ||
| final int bulkScoreChunk = Math.min((neighbors.nodes().length + 1) / 2, bulkScoreNodes.length); |
There was a problem hiding this comment.
IIUC neighbors.nodes() returns the internal array used to store neighbor nodes, which can be larger than actual number of neighbors and is exponentially growing -- can we have edge cases where the length is about twice the actual number of neighbors, causing us to bulk score everything?
I wonder if we should use neighbors.size() instead?
There was a problem hiding this comment.
good call! And good call on the array copy!, Just needed to handle the tail. New benchmarks show even nicer perf ;)
| private final int[] bulkScoreNodes; // for bulk scoring | ||
| private final float[] bulkScores; // for bulk scoring |
There was a problem hiding this comment.
I was worried about thread-safety of these arrays (given we have concurrent merging), but from this comment it looks like instances of this class are not shared across threads, but rather multiple instances of this class (across different threads) can operate on a single HnswGraph?
There was a problem hiding this comment.
@kaivalnp the scorer itself isn't threadsafe. I assumed that since we were using a scorer, we were OK.
I had the same threading concerns and looked up and it seems that each thread as a unique builder object instance (The worker of the thread) and they all work on the same graph.
| throws IOException { | ||
| int bulkCount = 0; | ||
| final int bulkScoreChunk = Math.min((neighbors.nodes().length + 1) / 2, bulkScoreNodes.length); | ||
| for (int i = 0; i < neighbors.size(); i++) { |
There was a problem hiding this comment.
Looks like we don't need to perform any operation per-node of this loop, can we reduce some iterations using something like:
private boolean diversityCheck(float score, NeighborArray neighbors, RandomVectorScorer scorer)
throws IOException {
final int chunk = Math.min(Math.ceilDiv(neighbors.size(), 2), bulkScoreNodes.length);
for (int start = 0; start < neighbors.size(); start += chunk) {
int length = Math.min(neighbors.size() - start, chunk);
System.arraycopy(neighbors.nodes(), start, bulkScoreNodes, 0, length);
if (scorer.bulkScore(bulkScoreNodes, bulkScores, length) >= score) {
return false;
}
}
return true;
}
kaivalnp
left a comment
There was a problem hiding this comment.
Thanks @benwtrent! Curious if you were able to run knnPerfTest?
| // handle a tail | ||
| if (scored < neighbors.size()) { | ||
| int chunkSize = neighbors.size() - scored; | ||
| System.arraycopy(neighbors.nodes(), scored, bulkScoreNodes, 0, chunkSize); | ||
| if (scorer.bulkScore(bulkScoreNodes, bulkScores, chunkSize) >= score) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
I don't think we need this tail -- we're doing Math.min(bulkScoreChunk, neighbors.size() - scored) in the above loop, which automatically bulk-scores the tail (using the second value)
There was a problem hiding this comment.
Yep, good call 🤦 its reflex.
| this.bulkScoreNodes = new int[8]; | ||
| this.bulkScores = new float[8]; |
There was a problem hiding this comment.
nit: maybe host 8 up as a private static final variable?
|
@kaivalnp I did run with 768 dim vectors (see PR description). I ran it again and I get perf around 77.04s for force merge to 90s force merge. Force merge performance is pretty variable, but with all my runs, this provides a measurable speed improvement. |
* Utilize bulk scoring interface during HNSW graph builder diversity check * iter * iter * iter * adding changes, adjusting bulk chunk size * iter * fixing chunk size * no tail needed, 🤦 * iter * iter
* Utilize bulk scoring interface during HNSW graph builder diversity check * iter * iter * iter * adding changes, adjusting bulk chunk size * iter * fixing chunk size * no tail needed, 🤦 * iter * iter
This adds bulk scoring to diversity check. While this means that diversity check cannot exit super early (e.g. if it only needs to check 2 docs), I continually see diversity check as being the most expensive part of HNSW graph merging.
This tells me that typically, it isn't just one doc that is checked.
I ran 1M 768 cohere, force-merging with 4 threads.
baseline: 128.01
candidate: 92.15