Remove soar duplicate checking (#132617)

benwtrent · web-flow · commit bfefe035626f · 2025-08-11T21:06:27.000+02:00
Through our various benchmarking runs, I have noticed we do a silly
amount of work just handling duplicate vectors for overspill. When it
comes to block scoring, it is likely much better to just score the
duplicates, and deduplicate later. This indeed is the case, and the
performance increases as the number of vector ops increases.

## Multi-segment Cohere-wiki-768 8M

I ran every nprobe 5 times and picked the fastest.

### CANDIDATE

```
index_name                      index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall     visited  filter_selectivity
------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ----------  ------------------
cohere-wikipedia-docs-768d.vec         ivf       10         7.12              0.00           0.00   140.45    0.80    83108.96                1.00
cohere-wikipedia-docs-768d.vec         ivf       20        10.47              0.00           0.00    95.51    0.86   169324.80                1.00
cohere-wikipedia-docs-768d.vec         ivf       50        19.86              0.00           0.00    50.35    0.91   461667.04                1.00
cohere-wikipedia-docs-768d.vec         ivf      100        33.65              0.00           0.00    29.72    0.94   950007.20                1.00
cohere-wikipedia-docs-768d.vec         ivf      200        57.04              0.00           0.00    17.53    0.95  1797631.04                1.00
cohere-wikipedia-docs-768d.vec         ivf      500       124.30              0.00           0.00     8.05    0.96  4334902.24                1.00
cohere-wikipedia-docs-768d.vec         ivf     1000       236.78              0.00           0.00     4.22    0.96  8521820.48                1.00
```

### BASELINE

```
index_name                      index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall     visited  filter_selectivity
------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ----------  ------------------
cohere-wikipedia-docs-768d.vec         ivf       10         7.21              0.00           0.00  138.70    0.81    74077.53                1.00
cohere-wikipedia-docs-768d.vec         ivf       20        10.83              0.00           0.00   92.34    0.86   144966.33                1.00
cohere-wikipedia-docs-768d.vec         ivf       50        21.75              0.00           0.00   45.98    0.91   365150.68                1.00
cohere-wikipedia-docs-768d.vec         ivf      100        38.25              0.00           0.00   26.14    0.93   698105.96                1.00
cohere-wikipedia-docs-768d.vec         ivf      200        65.61              0.00           0.00   15.24    0.95  1278157.01                1.00
cohere-wikipedia-docs-768d.vec         ivf      500       148.98              0.00           0.00    6.71    0.95  2890457.27                1.00
cohere-wikipedia-docs-768d.vec         ivf     1000       281.02              0.00           0.00    3.56    0.95  4939370.44                1.00
```

## Single segment Cohere-wiki-1024 1M

My thought being that maybe larger vectors will make block scoring more
expensive, so picking individual vectors would be better. Same
methodology as above

### Candidate

```
index_name        index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count      QPS  recall    visited  filter_selectivity
----------------  ----------  -------  -----------  ----------------  -------------  -------  ------  ---------  ------------------
wiki1024en.train                       ivf       10         0.63              0.00           0.00  1587.30    0.81     6389.60                1.00
wiki1024en.train                       ivf       20         0.86              0.00           0.00  1162.79    0.88    12528.48                1.00
wiki1024en.train                       ivf       50         1.43              0.00           0.00   699.30    0.93    30627.04                1.00
wiki1024en.train                       ivf      100         2.30              0.00           0.00   434.78    0.95    61259.84                1.00
wiki1024en.train                       ivf      200         4.12              0.00           0.00   242.72    0.97   122569.44                1.00
wiki1024en.train                       ivf      500         9.64              0.00           0.00   103.73    0.98   307816.80                1.00
wiki1024en.train                       ivf     1000        18.79              0.00           0.00    53.22    0.98   618772.32                1.00
```

### Baseline

```
index_name        index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count      QPS  recall    visited  filter_selectivity
----------------  ----------  -------  -----------  ----------------  -------------  -------  ------  ---------  ------------------
wiki1024en.train         ivf       10         0.65              0.00           0.00  1538.46    0.82    5680.72                1.00
wiki1024en.train         ivf       20         0.84              0.00           0.00  1190.48    0.88   10677.40                1.00
wiki1024en.train         ivf       50         1.49              0.00           0.00   671.14    0.94   24431.26                1.00
wiki1024en.train         ivf      100         2.41              0.00           0.00   414.94    0.96   47000.85                1.00
wiki1024en.train         ivf      200         4.56              0.00           0.00   219.30    0.97   91284.42                1.00
wiki1024en.train         ivf      500        10.56              0.00           0.00    94.70    0.98  218185.33                1.00
wiki1024en.train         ivf     1000        20.81              0.00           0.00    48.05    0.98  412137.05                1.00
```
diff --git a/server/src/main/java/org/elasticsearch/index/codec/vectors/DefaultIVFVectorsReader.java b/server/src/main/java/org/elasticsearch/index/codec/vectors/DefaultIVFVectorsReader.java
@@ -16,6 +16,7 @@
 import org.apache.lucene.search.KnnCollector;
 import org.apache.lucene.store.IndexInput;
 import org.apache.lucene.util.ArrayUtil;
+import org.apache.lucene.util.Bits;
 import org.apache.lucene.util.VectorUtil;
 import org.apache.lucene.util.hnsw.NeighborQueue;
 import org.elasticsearch.index.codec.vectors.reflect.OffHeapStats;
@@ -25,7 +26,6 @@
 
 import java.io.IOException;
 import java.util.Map;
-import java.util.function.IntPredicate;
 
 import static org.apache.lucene.codecs.lucene102.Lucene102BinaryQuantizedVectorsFormat.QUERY_BITS;
 import static org.apache.lucene.index.VectorSimilarityFunction.COSINE;
@@ -294,11 +294,10 @@ private static void score(
     }
 
     @Override
-    PostingVisitor getPostingVisitor(FieldInfo fieldInfo, IndexInput indexInput, float[] target, IntPredicate needsScoring)
-        throws IOException {
+    PostingVisitor getPostingVisitor(FieldInfo fieldInfo, IndexInput indexInput, float[] target, Bits acceptDocs) throws IOException {
         FieldEntry entry = fields.get(fieldInfo.number);
         final int maxPostingListSize = indexInput.readVInt();
-        return new MemorySegmentPostingsVisitor(target, indexInput, entry, fieldInfo, maxPostingListSize, needsScoring);
+        return new MemorySegmentPostingsVisitor(target, indexInput, entry, fieldInfo, maxPostingListSize, acceptDocs);
     }
 
     @Override
@@ -312,7 +311,7 @@ private static class MemorySegmentPostingsVisitor implements PostingVisitor {
         final float[] target;
         final FieldEntry entry;
         final FieldInfo fieldInfo;
-        final IntPredicate needsScoring;
+        final Bits acceptDocs;
         private final ES91OSQVectorsScorer osqVectorsScorer;
         final float[] scores = new float[BULK_SIZE];
         final float[] correctionsLower = new float[BULK_SIZE];
@@ -342,13 +341,13 @@ private static class MemorySegmentPostingsVisitor implements PostingVisitor {
             FieldEntry entry,
             FieldInfo fieldInfo,
             int maxPostingListSize,
-            IntPredicate needsScoring
+            Bits acceptDocs
         ) throws IOException {
             this.target = target;
             this.indexInput = indexInput;
             this.entry = entry;
             this.fieldInfo = fieldInfo;
-            this.needsScoring = needsScoring;
+            this.acceptDocs = acceptDocs;
             centroid = new float[fieldInfo.getVectorDimension()];
             scratch = new float[target.length];
             quantizationScratch = new int[target.length];
@@ -419,11 +418,12 @@ private float scoreIndividually(int offset) throws IOException {
             return maxScore;
         }
 
-        private static int docToBulkScore(int[] docIds, int offset, IntPredicate needsScoring) {
+        private static int docToBulkScore(int[] docIds, int offset, Bits acceptDocs) {
+            assert acceptDocs != null : "acceptDocs must not be null";
             int docToScore = ES91OSQVectorsScorer.BULK_SIZE;
             for (int i = 0; i < ES91OSQVectorsScorer.BULK_SIZE; i++) {
                 final int idx = offset + i;
-                if (needsScoring.test(docIds[idx]) == false) {
+                if (acceptDocs.get(docIds[idx]) == false) {
                     docIds[idx] = -1;
                     docToScore--;
                 }
@@ -447,7 +447,7 @@ public int visit(KnnCollector knnCollector) throws IOException {
             int limit = vectors - BULK_SIZE + 1;
             int i = 0;
             for (; i < limit; i += BULK_SIZE) {
-                final int docsToBulkScore = docToBulkScore(docIdsScratch, i, needsScoring);
+                final int docsToBulkScore = acceptDocs == null ? BULK_SIZE : docToBulkScore(docIdsScratch, i, acceptDocs);
                 if (docsToBulkScore == 0) {
                     continue;
                 }
@@ -476,7 +476,7 @@ public int visit(KnnCollector knnCollector) throws IOException {
             // process tail
             for (; i < vectors; i++) {
                 int doc = docIdsScratch[i];
-                if (needsScoring.test(doc)) {
+                if (acceptDocs == null || acceptDocs.get(doc)) {
                     quantizeQueryIfNecessary();
                     indexInput.seek(slicePos + i * quantizedByteLength);
                     float qcDist = osqVectorsScorer.quantizeScore(quantizedQueryScratch);
diff --git a/server/src/main/java/org/elasticsearch/index/codec/vectors/IVFVectorsReader.java b/server/src/main/java/org/elasticsearch/index/codec/vectors/IVFVectorsReader.java
@@ -29,12 +29,10 @@
 import org.apache.lucene.store.IndexInput;
 import org.apache.lucene.util.BitSet;
 import org.apache.lucene.util.Bits;
-import org.apache.lucene.util.FixedBitSet;
 import org.elasticsearch.core.IOUtils;
 import org.elasticsearch.search.vectors.IVFKnnSearchStrategy;
 
 import java.io.IOException;
-import java.util.function.IntPredicate;
 
 import static org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsReader.SIMILARITY_FUNCTIONS;
 import static org.elasticsearch.index.codec.vectors.IVFVectorsFormat.DYNAMIC_NPROBE;
@@ -224,13 +222,6 @@ public final void search(String field, float[] target, KnnCollector knnCollector
             percentFiltered = Math.max(0f, Math.min(1f, (float) bitSet.approximateCardinality() / bitSet.length()));
         }
         int numVectors = rawVectorsReader.getFloatVectorValues(field).size();
-        BitSet visitedDocs = new FixedBitSet(state.segmentInfo.maxDoc() + 1);
-        IntPredicate needsScoring = docId -> {
-            if (acceptDocs != null && acceptDocs.get(docId) == false) {
-                return false;
-            }
-            return visitedDocs.getAndSet(docId) == false;
-        };
         int nProbe = DYNAMIC_NPROBE;
         // Search strategy may be null if this is being called from checkIndex (e.g. from a test)
         if (knnCollector.getSearchStrategy() instanceof IVFKnnSearchStrategy ivfSearchStrategy) {
@@ -248,7 +239,7 @@ public final void search(String field, float[] target, KnnCollector knnCollector
             nProbe = Math.max(Math.min(nProbe, entry.numCentroids), 1);
         }
         CentroidIterator centroidIterator = getCentroidIterator(fieldInfo, entry.numCentroids, entry.centroidSlice(ivfCentroids), target);
-        PostingVisitor scorer = getPostingVisitor(fieldInfo, entry.postingListSlice(ivfClusters), target, needsScoring);
+        PostingVisitor scorer = getPostingVisitor(fieldInfo, entry.postingListSlice(ivfClusters), target, acceptDocs);
         int centroidsVisited = 0;
         long expectedDocs = 0;
         long actualDocs = 0;
@@ -316,7 +307,7 @@ IndexInput postingListSlice(IndexInput postingListFile) throws IOException {
         }
     }
 
-    abstract PostingVisitor getPostingVisitor(FieldInfo fieldInfo, IndexInput postingsLists, float[] target, IntPredicate needsScoring)
+    abstract PostingVisitor getPostingVisitor(FieldInfo fieldInfo, IndexInput postingsLists, float[] target, Bits needsScoring)
         throws IOException;
 
     interface CentroidIterator {
diff --git a/server/src/main/java/org/elasticsearch/search/vectors/AbstractIVFKnnVectorQuery.java b/server/src/main/java/org/elasticsearch/search/vectors/AbstractIVFKnnVectorQuery.java
@@ -9,6 +9,8 @@
 
 package org.elasticsearch.search.vectors;
 
+import com.carrotsearch.hppc.IntHashSet;
+
 import org.apache.lucene.index.IndexReader;
 import org.apache.lucene.index.LeafReader;
 import org.apache.lucene.index.LeafReaderContext;
@@ -115,7 +117,10 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOException {
             filterWeight = null;
         }
         // we request numCands as we are using it as an approximation measure
-        KnnCollectorManager knnCollectorManager = getKnnCollectorManager(numCands, indexSearcher);
+        // we need to ensure we are getting at least 2*k results to ensure we cover overspill duplicates
+        // TODO move the logic for automatically adjusting percentages/nprobe to the query, so we can only pass
+        // 2k to the collector.
+        KnnCollectorManager knnCollectorManager = getKnnCollectorManager(Math.max(Math.round(2f * k), numCands), indexSearcher);
         TaskExecutor taskExecutor = indexSearcher.getTaskExecutor();
         List<LeafReaderContext> leafReaderContexts = reader.leaves();
         List<Callable<TopDocs>> tasks = new ArrayList<>(leafReaderContexts.size());
@@ -135,12 +140,23 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOException {
 
     private TopDocs searchLeaf(LeafReaderContext ctx, Weight filterWeight, KnnCollectorManager knnCollectorManager) throws IOException {
         TopDocs results = getLeafResults(ctx, filterWeight, knnCollectorManager);
-        if (ctx.docBase > 0) {
-            for (ScoreDoc scoreDoc : results.scoreDocs) {
+        IntHashSet dedup = new IntHashSet(results.scoreDocs.length * 4 / 3);
+        int deduplicateCount = 0;
+        for (ScoreDoc scoreDoc : results.scoreDocs) {
+            if (dedup.add(scoreDoc.doc)) {
+                deduplicateCount++;
+            }
+        }
+        ScoreDoc[] deduplicatedScoreDocs = new ScoreDoc[deduplicateCount];
+        dedup.clear();
+        int index = 0;
+        for (ScoreDoc scoreDoc : results.scoreDocs) {
+            if (dedup.add(scoreDoc.doc)) {
                 scoreDoc.doc += ctx.docBase;
+                deduplicatedScoreDocs[index++] = scoreDoc;
             }
         }
-        return results;
+        return new TopDocs(results.totalHits, deduplicatedScoreDocs);
     }
 
     TopDocs getLeafResults(LeafReaderContext ctx, Weight filterWeight, KnnCollectorManager knnCollectorManager) throws IOException {