Better vectorize score computations. (#15039)

jpountz · jpountz · commit be832536f146 · 2025-08-30T14:19:13.000+02:00
Existing auto-vectorization of scores is a bit fragile since it relies on `SimScorer#score` being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of `SimScorer` may be used, in particular those from `FeatureField`. Furthermore, existing auto-vectorization has some room for improvement as @gf2121 highlighted at #14679 (comment).
diff --git a/lucene/CHANGES.txt b/lucene/CHANGES.txt
@@ -157,6 +157,9 @@ Optimizations
 
 * GITHUB#15004: Wraps all iterator with likelyImpactsEnum under BlockMaxConjunctionBulkScorer. (Ge Song)
 
+* GITHUB#15039: Score computations are now more reliably vectorized.
+   (Adrien Grand, Guo Feng)
+
 * GITHUB#15080: Use DocValuesSkippers in SortedNumericDocValuesRangeQuery#count(). (Alan Woodward)
 
 * GITHUB#15039: Score computations are now more reliably vectorized.
@@ -1031,7 +1034,7 @@ Improvements
 
 * GITHUB#13285: Early terminate graph searches of AbstractVectorSimilarityQuery to follow timeout set from
   IndexSearcher#setTimeout(QueryTimeout). (Kaival Parikh)
-  
+
 * GITHUB#13633: Add ability to read/write knn vector values to a MemoryIndex. (Ben Trent)
 
 * GITHUB#12627: patch HNSW graphs to improve reachability of all nodes from entry points
@@ -1948,7 +1951,7 @@ New Features
   closed while queries are running can no longer crash the JVM. To disable this feature,
   pass the following sysprop on Java command line:
   "-Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false" (Uwe Schindler)
-  
+
 * GITHUB#12252 Add function queries for computing similarity scores between knn vectors. (Elia Porciani, Alessandro Benedetti)
 
 Improvements
@@ -2627,7 +2630,7 @@ New Features
 * LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery
   to speed up computing the number of hits when possible. (Lu Xugang, Luca Cavanna, Adrien Grand)
 
-* LUCENE-10422: Monitor Improvements: `Monitor` can use a custom `Directory` 
+* LUCENE-10422: Monitor Improvements: `Monitor` can use a custom `Directory`
   implementation. `Monitor` can be created with a readonly `QueryIndex` in order to
   have readonly `Monitor` instances. (Niko Usai)
 
@@ -2686,7 +2689,7 @@ Optimizations
   term of each block as a dictionary when compressing suffixes of the other 63
   terms of the block. (Adrien Grand)
 
-* LUCENE-10411: Add nearest neighbors vectors support to ExitableDirectoryReader. 
+* LUCENE-10411: Add nearest neighbors vectors support to ExitableDirectoryReader.
   (Zach Chen, Adrien Grand, Julie Tibshirani, Tomoko Uchida)
 
 * LUCENE-10542: FieldSource exists implementations can avoid value retrieval (Kevin Risden)
@@ -2851,7 +2854,7 @@ New Features
   points are indexed.
   (Quentin Pradet, Adrien Grand)
 
-* LUCENE-10263: Added Weight#count to NormsFieldExistsQuery to speed up the query if all 
+* LUCENE-10263: Added Weight#count to NormsFieldExistsQuery to speed up the query if all
   documents have the field.. (Alan Woodward)
 
 * LUCENE-10248: Add SpanishPluralStemFilter, for precise stemming of Spanish plurals.
@@ -2877,14 +2880,14 @@ New Features
 
 * LUCENE-10403: Add ArrayUtil#grow(T[]). (Greg Miller)
 
-* LUCENE-10414: Add fn:fuzzyTerm interval function to flexible query parser (Dawid Weiss, 
+* LUCENE-10414: Add fn:fuzzyTerm interval function to flexible query parser (Dawid Weiss,
   Alan Woodward)
-  
+
 * LUCENE-10378: Implement Weight#count for PointRangeQuery to provide a faster way to calculate
   the number of matching range docs when each doc has at-most one point and the points are 1-dimensional.
   (Gautam Worah, Ignacio Vera, Adrien Grand)
 
-* LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count. (Ignacio Vera)     
+* LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count. (Ignacio Vera)
 
 * LUCENE-10382: Add support for filtering in KnnVectorQuery. This allows for finding the
   nearest k documents that also match a query. (Julie Tibshirani, Joel Bernstein)
@@ -2901,10 +2904,10 @@ Improvements
 
 * LUCENE-10238: Upgrade icu4j dependency to 70.1. (Dawid Weiss)
 
-* LUCENE-9820: Extract BKD tree interface and move intersecting logic to the 
+* LUCENE-9820: Extract BKD tree interface and move intersecting logic to the
   PointValues abstract class. (Ignacio Vera, Adrien Grand)
-  
-* LUCENE-10262: Lift up restrictions for navigating PointValues#PointTree 
+
+* LUCENE-10262: Lift up restrictions for navigating PointValues#PointTree
   added in LUCENE-9820 (Ignacio Vera)
 
 * LUCENE-9538: Detect polygon self-intersections in the Tessellator. (Ignacio Vera)
@@ -3019,8 +3022,8 @@ Bug Fixes
 
 * LUCENE-10407: Containing intervals could sometimes yield incorrect matches when wrapped
   in a disjunction. (Alan Woodward, Dawid Weiss)
-  
-* LUCENE-10405: When using the MemoryIndex, binary and Sorted doc values are stored 
+
+* LUCENE-10405: When using the MemoryIndex, binary and Sorted doc values are stored
    as BytesRef instead of BytesRefHash so they don't have a limit on size. (Ignacio Vera)
 
 * LUCENE-10428: Queries with a misbehaving score function may no longer cause
@@ -3052,7 +3055,7 @@ Other
 
 * LUCENE-10413: Make Ukrainian default stop words list available as a public getter. (Alan Woodward)
 
-* LUCENE-10437: Polygon tessellator throws a more informative error message when the provided polygon 
+* LUCENE-10437: Polygon tessellator throws a more informative error message when the provided polygon
   does not contain enough no-collinear points. (Ignacio Vera)
 
 ======================= Lucene 9.0.0 =======================
@@ -3171,7 +3174,7 @@ API Changes
   only applicable for fields that are indexed with doc values only. (Mayya Sharipova,
   Adrien Grand, Simon Willnauer)
 
-* LUCENE-9047: Directory API is now little endian. (Ignacio Vera, Adrien Grand)  
+* LUCENE-9047: Directory API is now little endian. (Ignacio Vera, Adrien Grand)
 
 * LUCENE-9948: No longer require the user to specify whether-or-not a field is multi-valued in
   LongValueFacetCounts (detect automatically based on what is indexed). (Greg Miller)
@@ -3384,7 +3387,7 @@ Improvements
   (David Smiley)
 
 * LUCENE-10062: Switch taxonomy faceting to use numeric doc values for storing ordinals instead of binary doc values
-  with its own custom encoding. (Greg Miller) 
+  with its own custom encoding. (Greg Miller)
 
 Bug fixes
 ---------------------
@@ -3507,10 +3510,10 @@ Other
 * LUCENE-9822: Add assertion to PFOR exception encoding, documenting the BLOCK_SIZE assumption. (Greg Miller)
 
 * LUCENE-9883: Turn on ecj missingEnumCaseDespiteDefault setting. (Zach Chen)
- 
-* LUCENE-9705: Make new versions of all index formats for the Lucene90 codec and move 
-  the existing ones to the backwards codecs. (Julie Tibshirani, Ignacio Vera)  
-  
+
+* LUCENE-9705: Make new versions of all index formats for the Lucene90 codec and move
+  the existing ones to the backwards codecs. (Julie Tibshirani, Ignacio Vera)
+
 * LUCENE-9907: Remove dependency on PackedInts#getReader() from the current codecs and move the
   method to backwards codec. (Ignacio Vera)
 
diff --git a/lucene/core/src/java/org/apache/lucene/search/TermScorer.java b/lucene/core/src/java/org/apache/lucene/search/TermScorer.java
@@ -22,6 +22,7 @@
 import org.apache.lucene.index.NumericDocValues;
 import org.apache.lucene.index.PostingsEnum;
 import org.apache.lucene.index.SlowImpactsEnum;
+import org.apache.lucene.search.similarities.Similarity.BulkSimScorer;
 import org.apache.lucene.search.similarities.Similarity.SimScorer;
 import org.apache.lucene.util.ArrayUtil;
 import org.apache.lucene.util.Bits;
@@ -36,6 +37,7 @@ public final class TermScorer extends Scorer {
   private final PostingsEnum postingsEnum;
   private final DocIdSetIterator iterator;
   private final SimScorer scorer;
+  private final BulkSimScorer bulkScorer;
   private final NumericDocValues norms;
   private final ImpactsDISI impactsDisi;
   private final MaxScoreCache maxScoreCache;
@@ -49,6 +51,7 @@ public TermScorer(PostingsEnum postingsEnum, SimScorer scorer, NumericDocValues
     impactsDisi = null;
     this.scorer = scorer;
     this.norms = norms;
+    this.bulkScorer = scorer.asBulkSimScorer();
   }
 
   /**
@@ -71,6 +74,7 @@ public TermScorer(
     }
     this.scorer = scorer;
     this.norms = norms;
+    this.bulkScorer = scorer.asBulkSimScorer();
   }
 
   @Override
@@ -165,10 +169,6 @@ public void nextDocsAndScores(int upTo, Bits liveDocs, DocAndFloatFeatureBuffer
       }
     }
 
-    for (int i = 0; i < size; ++i) {
-      // Unless SimScorer#score is megamorphic, SimScorer#score should inline and (part of) score
-      // computations should auto-vectorize.
-      buffer.features[i] = scorer.score(buffer.features[i], normValues[i]);
-    }
+    bulkScorer.score(buffer.size, buffer.features, normValues, buffer.features);
   }
 }
diff --git a/lucene/core/src/java/org/apache/lucene/search/similarities/BM25Similarity.java b/lucene/core/src/java/org/apache/lucene/search/similarities/BM25Similarity.java
@@ -21,6 +21,7 @@
 import org.apache.lucene.search.CollectionStatistics;
 import org.apache.lucene.search.Explanation;
 import org.apache.lucene.search.TermStatistics;
+import org.apache.lucene.util.ArrayUtil;
 import org.apache.lucene.util.SmallFloat;
 
 /**
@@ -217,8 +218,7 @@ private static class BM25Scorer extends SimScorer {
       this.weight = boost * idf.getValue().floatValue();
     }
 
-    @Override
-    public float score(float freq, long encodedNorm) {
+    private float doScore(float freq, float normInverse) {
       // In order to guarantee monotonicity with both freq and norm without
       // promoting to doubles, we rewrite freq / (freq + norm) to
       // 1 - 1 / (1 + freq * 1/norm).
@@ -228,10 +228,38 @@ public float score(float freq, long encodedNorm) {
       // x -> 1 + x and x -> 1 - 1/x.
       // Finally we expand weight * (1 - 1 / (1 + freq * 1/norm)) to
       // weight - weight / (1 + freq * 1/norm), which runs slightly faster.
-      float normInverse = cache[((byte) encodedNorm) & 0xFF];
       return weight - weight / (1f + freq * normInverse);
     }
 
+    @Override
+    public float score(float freq, long encodedNorm) {
+      float normInverse = cache[((byte) encodedNorm) & 0xFF];
+      return doScore(freq, normInverse);
+    }
+
+    @Override
+    public BulkSimScorer asBulkSimScorer() {
+      return new BulkSimScorer() {
+
+        private float[] normInverses = new float[0];
+
+        @Override
+        public void score(int size, float[] freqs, long[] norms, float[] scores) {
+          if (normInverses.length < size) {
+            normInverses = new float[ArrayUtil.oversize(size, Float.BYTES)];
+          }
+          for (int i = 0; i < size; ++i) {
+            normInverses[i] = cache[((byte) norms[i]) & 0xFF];
+          }
+
+          // This loop auto-vectorizes.
+          for (int i = 0; i < size; ++i) {
+            scores[i] = doScore(freqs[i], normInverses[i]);
+          }
+        }
+      };
+    }
+
     @Override
     public Explanation explain(Explanation freq, long encodedNorm) {
       List<Explanation> subs = new ArrayList<>(explainConstantFactors());
diff --git a/lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java b/lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java
@@ -17,6 +17,7 @@
 package org.apache.lucene.search.similarities;
 
 import java.util.Collections;
+import java.util.Objects;
 import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
 import org.apache.lucene.document.NumericDocValuesField;
 import org.apache.lucene.index.FieldInvertState;
@@ -208,6 +209,16 @@ protected SimScorer() {}
      */
     public abstract float score(float freq, long norm);
 
+    /**
+     * Return a {@link BulkSimScorer} that produces the exact same scores as this {@link SimScorer}
+     * but is more efficient at bulk-computing scores.
+     *
+     * <p><b>NOTE</b>: The returned instance is not thread-safe.
+     */
+    public BulkSimScorer asBulkSimScorer() {
+      return new DefaultBulkSimScorer(this);
+    }
+
     /**
      * Explain the score for a single document
      *
@@ -223,4 +234,38 @@ public Explanation explain(Explanation freq, long norm) {
           Collections.singleton(freq));
     }
   }
+
+  /** Specialization of {@link SimScorer} for bulk-computation of scores. */
+  public interface BulkSimScorer {
+
+    /**
+     * Bulk computation of scores. For each index {@code i} in [0, size), scores[i] is computed as
+     * score(freqs[i], norms[i]). The default implementation does the following:
+     *
+     * <pre class="prettyprint">
+     * for (int i = 0; i &lt; size; ++i) {
+     *   scores[i] = score(freqs[i], norms[i]);
+     * }
+     * </pre>
+     *
+     * <p><b>NOTE</b>: It is legal to pass the same {@code freqs} and {@code scores} arrays.
+     */
+    void score(int size, float[] freqs, long[] norms, float[] scores);
+  }
+
+  private static class DefaultBulkSimScorer implements BulkSimScorer {
+
+    private final SimScorer scorer;
+
+    DefaultBulkSimScorer(SimScorer scorer) {
+      this.scorer = Objects.requireNonNull(scorer);
+    }
+
+    @Override
+    public void score(int size, float[] freqs, long[] norms, float[] scores) {
+      for (int i = 0; i < size; ++i) {
+        scores[i] = scorer.score(freqs[i], norms[i]);
+      }
+    }
+  }
 }
diff --git a/lucene/test-framework/src/java/org/apache/lucene/tests/search/similarities/AssertingSimilarity.java b/lucene/test-framework/src/java/org/apache/lucene/tests/search/similarities/AssertingSimilarity.java
@@ -99,6 +99,26 @@ public Explanation explain(Explanation freq, long norm) {
           == delegate.score(freq.getValue().floatValue(), norm);
       return explanation;
     }
+
+    @Override
+    public BulkSimScorer asBulkSimScorer() {
+      BulkSimScorer bulkScorer = delegate.asBulkSimScorer();
+      return new BulkSimScorer() {
+        @Override
+        public void score(int size, float[] freqs, long[] norms, float[] scores) {
+          for (int i = 0; i < size; ++i) {
+            assert freqs[i] > 0;
+            assert norms[i] != 0;
+          }
+          bulkScorer.score(size, freqs, norms, scores);
+          for (int i = 0; i < size; ++i) {
+            float score = scores[i];
+            assert Float.isFinite(score);
+            assert score >= 0;
+          }
+        }
+      };
+    }
   }
 
   @Override
diff --git a/lucene/test-framework/src/java/org/apache/lucene/tests/search/similarities/BaseSimilarityTestCase.java b/lucene/test-framework/src/java/org/apache/lucene/tests/search/similarities/BaseSimilarityTestCase.java
@@ -27,12 +27,14 @@
 import org.apache.lucene.search.TermStatistics;
 import org.apache.lucene.search.similarities.IndriDirichletSimilarity;
 import org.apache.lucene.search.similarities.Similarity;
+import org.apache.lucene.search.similarities.Similarity.BulkSimScorer;
 import org.apache.lucene.search.similarities.Similarity.SimScorer;
 import org.apache.lucene.store.Directory;
 import org.apache.lucene.tests.index.RandomIndexWriter;
 import org.apache.lucene.tests.search.CheckHits;
 import org.apache.lucene.tests.util.LuceneTestCase;
 import org.apache.lucene.tests.util.TestUtil;
+import org.apache.lucene.util.ArrayUtil;
 import org.apache.lucene.util.BytesRef;
 import org.apache.lucene.util.IOUtils;
 import org.apache.lucene.util.SmallFloat;
@@ -521,4 +523,41 @@ private static void doTestScoring(
       }
     }
   }
+
+  public void testBulkScore() throws IOException {
+    Random random = random();
+    Similarity similarity = getSimilarity(random);
+    CollectionStatistics corpus = newCorpus(random, 1);
+    TermStatistics term = newTerm(random, corpus);
+    SimScorer scorer = similarity.scorer(random().nextFloat(5f), corpus, term);
+    BulkSimScorer bulkScorer = scorer.asBulkSimScorer();
+    int freqUpperBound =
+        Math.toIntExact(Math.min(term.totalTermFreq() - term.docFreq() + 1, Integer.MAX_VALUE));
+
+    float[] freqs = new float[0];
+    long[] norms = new long[0];
+    float[] scores = new float[0];
+
+    int iters = atLeast(3);
+    for (int iter = 0; iter < iters; ++iter) {
+      int size = TestUtil.nextInt(random, 0, 200);
+      if (size > freqs.length) {
+        freqs = new float[ArrayUtil.oversize(size, Float.BYTES)];
+        norms = new long[freqs.length];
+        scores = new float[freqs.length];
+      }
+      for (int i = 0; i < size; ++i) {
+        freqs[i] = TestUtil.nextInt(random, 1, freqUpperBound);
+        norms[i] = TestUtil.nextLong(random, 1, 255);
+      }
+
+      float[] expectedScores = new float[size];
+      for (int i = 0; i < size; ++i) {
+        expectedScores[i] = scorer.score(freqs[i], norms[i]);
+      }
+      bulkScorer.score(size, freqs, norms, scores);
+
+      assertArrayEquals(expectedScores, ArrayUtil.copyOfSubArray(scores, 0, size), 0f);
+    }
+  }
 }

Original file line number	Diff line number	Diff line change
`@@ -22,6 +22,7 @@`
`22`	`22`	`import org.apache.lucene.index.NumericDocValues;`
`23`	`23`	`import org.apache.lucene.index.PostingsEnum;`
`24`	`24`	`import org.apache.lucene.index.SlowImpactsEnum;`
	`25`	`+import org.apache.lucene.search.similarities.Similarity.BulkSimScorer;`
`25`	`26`	`import org.apache.lucene.search.similarities.Similarity.SimScorer;`
`26`	`27`	`import org.apache.lucene.util.ArrayUtil;`
`27`	`28`	`import org.apache.lucene.util.Bits;`
`@@ -36,6 +37,7 @@ public final class TermScorer extends Scorer {`
`36`	`37`	`private final PostingsEnum postingsEnum;`
`37`	`38`	`private final DocIdSetIterator iterator;`
`38`	`39`	`private final SimScorer scorer;`
	`40`	`+ private final BulkSimScorer bulkScorer;`
`39`	`41`	`private final NumericDocValues norms;`
`40`	`42`	`private final ImpactsDISI impactsDisi;`
`41`	`43`	`private final MaxScoreCache maxScoreCache;`
`@@ -49,6 +51,7 @@ public TermScorer(PostingsEnum postingsEnum, SimScorer scorer, NumericDocValues`
`49`	`51`	`impactsDisi = null;`
`50`	`52`	`this.scorer = scorer;`
`51`	`53`	`this.norms = norms;`
	`54`	`+ this.bulkScorer = scorer.asBulkSimScorer();`
`52`	`55`	`}`
`53`	`56`
`54`	`57`	`/**`
`@@ -71,6 +74,7 @@ public TermScorer(`
`71`	`74`	`}`
`72`	`75`	`this.scorer = scorer;`
`73`	`76`	`this.norms = norms;`
	`77`	`+ this.bulkScorer = scorer.asBulkSimScorer();`
`74`	`78`	`}`
`75`	`79`
`76`	`80`	`@Override`
`@@ -165,10 +169,6 @@ public void nextDocsAndScores(int upTo, Bits liveDocs, DocAndFloatFeatureBuffer`
`165`	`169`	`}`
`166`	`170`	`}`
`167`	`171`
`168`		`- for (int i = 0; i < size; ++i) {`
`169`		`- // Unless SimScorer#score is megamorphic, SimScorer#score should inline and (part of) score`
`170`		`- // computations should auto-vectorize.`
`171`		`- buffer.features[i] = scorer.score(buffer.features[i], normValues[i]);`
`172`		`- }`
	`172`	`+ bulkScorer.score(buffer.size, buffer.features, normValues, buffer.features);`
`173`	`173`	`}`
`174`	`174`	`}`