Skip to content

Commit be83253

Browse files
committed
Better vectorize score computations. (#15039)
Existing auto-vectorization of scores is a bit fragile since it relies on `SimScorer#score` being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of `SimScorer` may be used, in particular those from `FeatureField`. Furthermore, existing auto-vectorization has some room for improvement as @gf2121 highlighted at #14679 (comment).
1 parent 56aa606 commit be83253

File tree

6 files changed

+163
-28
lines changed

6 files changed

+163
-28
lines changed

lucene/CHANGES.txt

Lines changed: 23 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,9 @@ Optimizations
157157

158158
* GITHUB#15004: Wraps all iterator with likelyImpactsEnum under BlockMaxConjunctionBulkScorer. (Ge Song)
159159

160+
* GITHUB#15039: Score computations are now more reliably vectorized.
161+
(Adrien Grand, Guo Feng)
162+
160163
* GITHUB#15080: Use DocValuesSkippers in SortedNumericDocValuesRangeQuery#count(). (Alan Woodward)
161164

162165
* GITHUB#15039: Score computations are now more reliably vectorized.
@@ -1031,7 +1034,7 @@ Improvements
10311034

10321035
* GITHUB#13285: Early terminate graph searches of AbstractVectorSimilarityQuery to follow timeout set from
10331036
IndexSearcher#setTimeout(QueryTimeout). (Kaival Parikh)
1034-
1037+
10351038
* GITHUB#13633: Add ability to read/write knn vector values to a MemoryIndex. (Ben Trent)
10361039

10371040
* GITHUB#12627: patch HNSW graphs to improve reachability of all nodes from entry points
@@ -1948,7 +1951,7 @@ New Features
19481951
closed while queries are running can no longer crash the JVM. To disable this feature,
19491952
pass the following sysprop on Java command line:
19501953
"-Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false" (Uwe Schindler)
1951-
1954+
19521955
* GITHUB#12252 Add function queries for computing similarity scores between knn vectors. (Elia Porciani, Alessandro Benedetti)
19531956

19541957
Improvements
@@ -2627,7 +2630,7 @@ New Features
26272630
* LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery
26282631
to speed up computing the number of hits when possible. (Lu Xugang, Luca Cavanna, Adrien Grand)
26292632

2630-
* LUCENE-10422: Monitor Improvements: `Monitor` can use a custom `Directory`
2633+
* LUCENE-10422: Monitor Improvements: `Monitor` can use a custom `Directory`
26312634
implementation. `Monitor` can be created with a readonly `QueryIndex` in order to
26322635
have readonly `Monitor` instances. (Niko Usai)
26332636

@@ -2686,7 +2689,7 @@ Optimizations
26862689
term of each block as a dictionary when compressing suffixes of the other 63
26872690
terms of the block. (Adrien Grand)
26882691

2689-
* LUCENE-10411: Add nearest neighbors vectors support to ExitableDirectoryReader.
2692+
* LUCENE-10411: Add nearest neighbors vectors support to ExitableDirectoryReader.
26902693
(Zach Chen, Adrien Grand, Julie Tibshirani, Tomoko Uchida)
26912694

26922695
* LUCENE-10542: FieldSource exists implementations can avoid value retrieval (Kevin Risden)
@@ -2851,7 +2854,7 @@ New Features
28512854
points are indexed.
28522855
(Quentin Pradet, Adrien Grand)
28532856

2854-
* LUCENE-10263: Added Weight#count to NormsFieldExistsQuery to speed up the query if all
2857+
* LUCENE-10263: Added Weight#count to NormsFieldExistsQuery to speed up the query if all
28552858
documents have the field.. (Alan Woodward)
28562859

28572860
* LUCENE-10248: Add SpanishPluralStemFilter, for precise stemming of Spanish plurals.
@@ -2877,14 +2880,14 @@ New Features
28772880

28782881
* LUCENE-10403: Add ArrayUtil#grow(T[]). (Greg Miller)
28792882

2880-
* LUCENE-10414: Add fn:fuzzyTerm interval function to flexible query parser (Dawid Weiss,
2883+
* LUCENE-10414: Add fn:fuzzyTerm interval function to flexible query parser (Dawid Weiss,
28812884
Alan Woodward)
2882-
2885+
28832886
* LUCENE-10378: Implement Weight#count for PointRangeQuery to provide a faster way to calculate
28842887
the number of matching range docs when each doc has at-most one point and the points are 1-dimensional.
28852888
(Gautam Worah, Ignacio Vera, Adrien Grand)
28862889

2887-
* LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count. (Ignacio Vera)
2890+
* LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count. (Ignacio Vera)
28882891

28892892
* LUCENE-10382: Add support for filtering in KnnVectorQuery. This allows for finding the
28902893
nearest k documents that also match a query. (Julie Tibshirani, Joel Bernstein)
@@ -2901,10 +2904,10 @@ Improvements
29012904

29022905
* LUCENE-10238: Upgrade icu4j dependency to 70.1. (Dawid Weiss)
29032906

2904-
* LUCENE-9820: Extract BKD tree interface and move intersecting logic to the
2907+
* LUCENE-9820: Extract BKD tree interface and move intersecting logic to the
29052908
PointValues abstract class. (Ignacio Vera, Adrien Grand)
2906-
2907-
* LUCENE-10262: Lift up restrictions for navigating PointValues#PointTree
2909+
2910+
* LUCENE-10262: Lift up restrictions for navigating PointValues#PointTree
29082911
added in LUCENE-9820 (Ignacio Vera)
29092912

29102913
* LUCENE-9538: Detect polygon self-intersections in the Tessellator. (Ignacio Vera)
@@ -3019,8 +3022,8 @@ Bug Fixes
30193022

30203023
* LUCENE-10407: Containing intervals could sometimes yield incorrect matches when wrapped
30213024
in a disjunction. (Alan Woodward, Dawid Weiss)
3022-
3023-
* LUCENE-10405: When using the MemoryIndex, binary and Sorted doc values are stored
3025+
3026+
* LUCENE-10405: When using the MemoryIndex, binary and Sorted doc values are stored
30243027
as BytesRef instead of BytesRefHash so they don't have a limit on size. (Ignacio Vera)
30253028

30263029
* LUCENE-10428: Queries with a misbehaving score function may no longer cause
@@ -3052,7 +3055,7 @@ Other
30523055

30533056
* LUCENE-10413: Make Ukrainian default stop words list available as a public getter. (Alan Woodward)
30543057

3055-
* LUCENE-10437: Polygon tessellator throws a more informative error message when the provided polygon
3058+
* LUCENE-10437: Polygon tessellator throws a more informative error message when the provided polygon
30563059
does not contain enough no-collinear points. (Ignacio Vera)
30573060

30583061
======================= Lucene 9.0.0 =======================
@@ -3171,7 +3174,7 @@ API Changes
31713174
only applicable for fields that are indexed with doc values only. (Mayya Sharipova,
31723175
Adrien Grand, Simon Willnauer)
31733176

3174-
* LUCENE-9047: Directory API is now little endian. (Ignacio Vera, Adrien Grand)
3177+
* LUCENE-9047: Directory API is now little endian. (Ignacio Vera, Adrien Grand)
31753178

31763179
* LUCENE-9948: No longer require the user to specify whether-or-not a field is multi-valued in
31773180
LongValueFacetCounts (detect automatically based on what is indexed). (Greg Miller)
@@ -3384,7 +3387,7 @@ Improvements
33843387
(David Smiley)
33853388

33863389
* LUCENE-10062: Switch taxonomy faceting to use numeric doc values for storing ordinals instead of binary doc values
3387-
with its own custom encoding. (Greg Miller)
3390+
with its own custom encoding. (Greg Miller)
33883391

33893392
Bug fixes
33903393
---------------------
@@ -3507,10 +3510,10 @@ Other
35073510
* LUCENE-9822: Add assertion to PFOR exception encoding, documenting the BLOCK_SIZE assumption. (Greg Miller)
35083511

35093512
* LUCENE-9883: Turn on ecj missingEnumCaseDespiteDefault setting. (Zach Chen)
3510-
3511-
* LUCENE-9705: Make new versions of all index formats for the Lucene90 codec and move
3512-
the existing ones to the backwards codecs. (Julie Tibshirani, Ignacio Vera)
3513-
3513+
3514+
* LUCENE-9705: Make new versions of all index formats for the Lucene90 codec and move
3515+
the existing ones to the backwards codecs. (Julie Tibshirani, Ignacio Vera)
3516+
35143517
* LUCENE-9907: Remove dependency on PackedInts#getReader() from the current codecs and move the
35153518
method to backwards codec. (Ignacio Vera)
35163519

lucene/core/src/java/org/apache/lucene/search/TermScorer.java

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
import org.apache.lucene.index.NumericDocValues;
2323
import org.apache.lucene.index.PostingsEnum;
2424
import org.apache.lucene.index.SlowImpactsEnum;
25+
import org.apache.lucene.search.similarities.Similarity.BulkSimScorer;
2526
import org.apache.lucene.search.similarities.Similarity.SimScorer;
2627
import org.apache.lucene.util.ArrayUtil;
2728
import org.apache.lucene.util.Bits;
@@ -36,6 +37,7 @@ public final class TermScorer extends Scorer {
3637
private final PostingsEnum postingsEnum;
3738
private final DocIdSetIterator iterator;
3839
private final SimScorer scorer;
40+
private final BulkSimScorer bulkScorer;
3941
private final NumericDocValues norms;
4042
private final ImpactsDISI impactsDisi;
4143
private final MaxScoreCache maxScoreCache;
@@ -49,6 +51,7 @@ public TermScorer(PostingsEnum postingsEnum, SimScorer scorer, NumericDocValues
4951
impactsDisi = null;
5052
this.scorer = scorer;
5153
this.norms = norms;
54+
this.bulkScorer = scorer.asBulkSimScorer();
5255
}
5356

5457
/**
@@ -71,6 +74,7 @@ public TermScorer(
7174
}
7275
this.scorer = scorer;
7376
this.norms = norms;
77+
this.bulkScorer = scorer.asBulkSimScorer();
7478
}
7579

7680
@Override
@@ -165,10 +169,6 @@ public void nextDocsAndScores(int upTo, Bits liveDocs, DocAndFloatFeatureBuffer
165169
}
166170
}
167171

168-
for (int i = 0; i < size; ++i) {
169-
// Unless SimScorer#score is megamorphic, SimScorer#score should inline and (part of) score
170-
// computations should auto-vectorize.
171-
buffer.features[i] = scorer.score(buffer.features[i], normValues[i]);
172-
}
172+
bulkScorer.score(buffer.size, buffer.features, normValues, buffer.features);
173173
}
174174
}

lucene/core/src/java/org/apache/lucene/search/similarities/BM25Similarity.java

Lines changed: 31 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
import org.apache.lucene.search.CollectionStatistics;
2222
import org.apache.lucene.search.Explanation;
2323
import org.apache.lucene.search.TermStatistics;
24+
import org.apache.lucene.util.ArrayUtil;
2425
import org.apache.lucene.util.SmallFloat;
2526

2627
/**
@@ -217,8 +218,7 @@ private static class BM25Scorer extends SimScorer {
217218
this.weight = boost * idf.getValue().floatValue();
218219
}
219220

220-
@Override
221-
public float score(float freq, long encodedNorm) {
221+
private float doScore(float freq, float normInverse) {
222222
// In order to guarantee monotonicity with both freq and norm without
223223
// promoting to doubles, we rewrite freq / (freq + norm) to
224224
// 1 - 1 / (1 + freq * 1/norm).
@@ -228,10 +228,38 @@ public float score(float freq, long encodedNorm) {
228228
// x -> 1 + x and x -> 1 - 1/x.
229229
// Finally we expand weight * (1 - 1 / (1 + freq * 1/norm)) to
230230
// weight - weight / (1 + freq * 1/norm), which runs slightly faster.
231-
float normInverse = cache[((byte) encodedNorm) & 0xFF];
232231
return weight - weight / (1f + freq * normInverse);
233232
}
234233

234+
@Override
235+
public float score(float freq, long encodedNorm) {
236+
float normInverse = cache[((byte) encodedNorm) & 0xFF];
237+
return doScore(freq, normInverse);
238+
}
239+
240+
@Override
241+
public BulkSimScorer asBulkSimScorer() {
242+
return new BulkSimScorer() {
243+
244+
private float[] normInverses = new float[0];
245+
246+
@Override
247+
public void score(int size, float[] freqs, long[] norms, float[] scores) {
248+
if (normInverses.length < size) {
249+
normInverses = new float[ArrayUtil.oversize(size, Float.BYTES)];
250+
}
251+
for (int i = 0; i < size; ++i) {
252+
normInverses[i] = cache[((byte) norms[i]) & 0xFF];
253+
}
254+
255+
// This loop auto-vectorizes.
256+
for (int i = 0; i < size; ++i) {
257+
scores[i] = doScore(freqs[i], normInverses[i]);
258+
}
259+
}
260+
};
261+
}
262+
235263
@Override
236264
public Explanation explain(Explanation freq, long encodedNorm) {
237265
List<Explanation> subs = new ArrayList<>(explainConstantFactors());

lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
package org.apache.lucene.search.similarities;
1818

1919
import java.util.Collections;
20+
import java.util.Objects;
2021
import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
2122
import org.apache.lucene.document.NumericDocValuesField;
2223
import org.apache.lucene.index.FieldInvertState;
@@ -208,6 +209,16 @@ protected SimScorer() {}
208209
*/
209210
public abstract float score(float freq, long norm);
210211

212+
/**
213+
* Return a {@link BulkSimScorer} that produces the exact same scores as this {@link SimScorer}
214+
* but is more efficient at bulk-computing scores.
215+
*
216+
* <p><b>NOTE</b>: The returned instance is not thread-safe.
217+
*/
218+
public BulkSimScorer asBulkSimScorer() {
219+
return new DefaultBulkSimScorer(this);
220+
}
221+
211222
/**
212223
* Explain the score for a single document
213224
*
@@ -223,4 +234,38 @@ public Explanation explain(Explanation freq, long norm) {
223234
Collections.singleton(freq));
224235
}
225236
}
237+
238+
/** Specialization of {@link SimScorer} for bulk-computation of scores. */
239+
public interface BulkSimScorer {
240+
241+
/**
242+
* Bulk computation of scores. For each index {@code i} in [0, size), scores[i] is computed as
243+
* score(freqs[i], norms[i]). The default implementation does the following:
244+
*
245+
* <pre class="prettyprint">
246+
* for (int i = 0; i &lt; size; ++i) {
247+
* scores[i] = score(freqs[i], norms[i]);
248+
* }
249+
* </pre>
250+
*
251+
* <p><b>NOTE</b>: It is legal to pass the same {@code freqs} and {@code scores} arrays.
252+
*/
253+
void score(int size, float[] freqs, long[] norms, float[] scores);
254+
}
255+
256+
private static class DefaultBulkSimScorer implements BulkSimScorer {
257+
258+
private final SimScorer scorer;
259+
260+
DefaultBulkSimScorer(SimScorer scorer) {
261+
this.scorer = Objects.requireNonNull(scorer);
262+
}
263+
264+
@Override
265+
public void score(int size, float[] freqs, long[] norms, float[] scores) {
266+
for (int i = 0; i < size; ++i) {
267+
scores[i] = scorer.score(freqs[i], norms[i]);
268+
}
269+
}
270+
}
226271
}

lucene/test-framework/src/java/org/apache/lucene/tests/search/similarities/AssertingSimilarity.java

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,26 @@ public Explanation explain(Explanation freq, long norm) {
9999
== delegate.score(freq.getValue().floatValue(), norm);
100100
return explanation;
101101
}
102+
103+
@Override
104+
public BulkSimScorer asBulkSimScorer() {
105+
BulkSimScorer bulkScorer = delegate.asBulkSimScorer();
106+
return new BulkSimScorer() {
107+
@Override
108+
public void score(int size, float[] freqs, long[] norms, float[] scores) {
109+
for (int i = 0; i < size; ++i) {
110+
assert freqs[i] > 0;
111+
assert norms[i] != 0;
112+
}
113+
bulkScorer.score(size, freqs, norms, scores);
114+
for (int i = 0; i < size; ++i) {
115+
float score = scores[i];
116+
assert Float.isFinite(score);
117+
assert score >= 0;
118+
}
119+
}
120+
};
121+
}
102122
}
103123

104124
@Override

lucene/test-framework/src/java/org/apache/lucene/tests/search/similarities/BaseSimilarityTestCase.java

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,12 +27,14 @@
2727
import org.apache.lucene.search.TermStatistics;
2828
import org.apache.lucene.search.similarities.IndriDirichletSimilarity;
2929
import org.apache.lucene.search.similarities.Similarity;
30+
import org.apache.lucene.search.similarities.Similarity.BulkSimScorer;
3031
import org.apache.lucene.search.similarities.Similarity.SimScorer;
3132
import org.apache.lucene.store.Directory;
3233
import org.apache.lucene.tests.index.RandomIndexWriter;
3334
import org.apache.lucene.tests.search.CheckHits;
3435
import org.apache.lucene.tests.util.LuceneTestCase;
3536
import org.apache.lucene.tests.util.TestUtil;
37+
import org.apache.lucene.util.ArrayUtil;
3638
import org.apache.lucene.util.BytesRef;
3739
import org.apache.lucene.util.IOUtils;
3840
import org.apache.lucene.util.SmallFloat;
@@ -521,4 +523,41 @@ private static void doTestScoring(
521523
}
522524
}
523525
}
526+
527+
public void testBulkScore() throws IOException {
528+
Random random = random();
529+
Similarity similarity = getSimilarity(random);
530+
CollectionStatistics corpus = newCorpus(random, 1);
531+
TermStatistics term = newTerm(random, corpus);
532+
SimScorer scorer = similarity.scorer(random().nextFloat(5f), corpus, term);
533+
BulkSimScorer bulkScorer = scorer.asBulkSimScorer();
534+
int freqUpperBound =
535+
Math.toIntExact(Math.min(term.totalTermFreq() - term.docFreq() + 1, Integer.MAX_VALUE));
536+
537+
float[] freqs = new float[0];
538+
long[] norms = new long[0];
539+
float[] scores = new float[0];
540+
541+
int iters = atLeast(3);
542+
for (int iter = 0; iter < iters; ++iter) {
543+
int size = TestUtil.nextInt(random, 0, 200);
544+
if (size > freqs.length) {
545+
freqs = new float[ArrayUtil.oversize(size, Float.BYTES)];
546+
norms = new long[freqs.length];
547+
scores = new float[freqs.length];
548+
}
549+
for (int i = 0; i < size; ++i) {
550+
freqs[i] = TestUtil.nextInt(random, 1, freqUpperBound);
551+
norms[i] = TestUtil.nextLong(random, 1, 255);
552+
}
553+
554+
float[] expectedScores = new float[size];
555+
for (int i = 0; i < size; ++i) {
556+
expectedScores[i] = scorer.score(freqs[i], norms[i]);
557+
}
558+
bulkScorer.score(size, freqs, norms, scores);
559+
560+
assertArrayEquals(expectedScores, ArrayUtil.copyOfSubArray(scores, 0, size), 0f);
561+
}
562+
}
524563
}

0 commit comments

Comments
 (0)