Skip to content

Commit 50b4363

Browse files
authored
Fix TestBooleanMinShouldMatch#testRandomQueries failure. (#14715)
This test generates random boolean queries and ensures that setting a minimum number of matching SHOULD clauses returns a subset of the hits with the same scores. It already tries to work around accuracy loss due to arithmetic operations by allowing a delta of up to one ulp between these two queries. However, sometimes the delta can be higher. For instance consider the following query that triggered the most recent test failure: `(data:5 data:5 data:5 data:6 +data:6 data:Z data:X -data:1)~2`. Without a minimum number of matching SHOULD clauses, it gets rewritten to `(data:5^3 +data:6^2 data:Z data:X -data:1)`. So the score contribution of `data:5` is computed as `(double) score(data:5) + (double) score(data:5) + (double) score(data:5)` in one case, and `(double) (score(data:5: * 3f)` (multiply first, then cast to a double) in the other case. The use of `ReqOptSumScorer` also contributes accuracy losses as per existing comment, for instance `data:6` is part of both the required and the optional clauses in the first case, while it's only a required clauses (with a 2x boost) in the other case. So accuracy loss accrues differently. I don't think we should try too hard to avoid these accuracy losses, so I'm instead increasing the leniency of the test.
1 parent 7759fde commit 50b4363

File tree

1 file changed

+12
-9
lines changed

1 file changed

+12
-9
lines changed

lucene/core/src/test/org/apache/lucene/search/TestBooleanMinShouldMatch.java

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
import org.apache.lucene.index.IndexReader;
2626
import org.apache.lucene.index.StoredFields;
2727
import org.apache.lucene.index.Term;
28+
import org.apache.lucene.search.BooleanClause.Occur;
2829
import org.apache.lucene.store.Directory;
2930
import org.apache.lucene.tests.index.RandomIndexWriter;
3031
import org.apache.lucene.tests.search.CheckHits;
@@ -359,7 +360,7 @@ public void postCreate(BooleanQuery.Builder q) {
359360
// System.out.println("Total hits:"+tot);
360361
}
361362

362-
private void assertSubsetOfSameScores(Query q, TopDocs top1, TopDocs top2) {
363+
private void assertSubsetOfSameScores(BooleanQuery q, TopDocs top1, TopDocs top2) {
363364
// The constrained query
364365
// should be a subset to the unconstrained query.
365366
if (top2.totalHits.value() > top1.totalHits.value()) {
@@ -371,6 +372,8 @@ private void assertSubsetOfSameScores(Query q, TopDocs top1, TopDocs top2) {
371372
+ q.toString());
372373
}
373374

375+
int numScoringClauses = q.getClauses(Occur.SHOULD).size() + q.getClauses(Occur.MUST).size();
376+
374377
for (int hit = 0; hit < top2.totalHits.value(); hit++) {
375378
int id = top2.scoreDocs[hit].doc;
376379
float score = top2.scoreDocs[hit].score;
@@ -391,14 +394,14 @@ private void assertSubsetOfSameScores(Query q, TopDocs top1, TopDocs top2) {
391394
+ q.toString(),
392395
score,
393396
otherScore,
394-
// If there is at least one MUST/FILTER clause and if
395-
// minShouldMatch is equal to the number of SHOULD clauses,
396-
// then a query that was previously executed with
397-
// ReqOptSumScorer is now executed with ConjunctionScorer.
398-
// We need to introduce some leniency because ReqOptSumScorer
399-
// casts intermediate values to floats before summing up again
400-
// which hurts accuracy.
401-
Math.ulp(score));
397+
// While BooleanQuery tries to contain the accuracy loss of scores by summing up into
398+
// doubles, there are a few places that do not do this, e.g. when (field:v field:v
399+
// field:v) gets rewritten into field:v^3, the multiplication by 3 is performed on the
400+
// float score, not on a double. Likewise, ReqOptSumScorer gets used in some cases,
401+
// which casts the score of optional and required clauses into floats before summing
402+
// them up together. So to avoid false positives, we allow losing one ulp of accuracy
403+
// per scoring clause.
404+
Math.ulp(score) * numScoringClauses);
402405
}
403406
}
404407

0 commit comments

Comments
 (0)