Skip to content

Commit 7b0ca79

Browse files
gf2121expani
andauthored
Provide better impacts for fields indexed with IndexOptions.DOCS (apache#14511)
Postings always return impacts with freq=Integer.MAX_VALUE and norm=1 when frequencies are not indexed (IndexOptions.DOCS). This significantly overestimates the score upper bound of term queries, since the similarity scorer is effectively called with freq=1 all the time in this case (and either norm=1 if norms are not indexed, or the number of terms in the field otherwise). This updates postings to always return impacts with freq=1 and norm=1 when frequencies are not indexed, which helps compute better score upper bounds, and in-turn makes dynamic pruning perform better. Closes apache#14445 Co-Authored-by: expani <[email protected]>
1 parent 6ad8a96 commit 7b0ca79

File tree

2 files changed

+14
-7
lines changed

2 files changed

+14
-7
lines changed

lucene/CHANGES.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,8 @@ Bug Fixes
6161
* GITHUB#14523, GITHUB#14530: Correct TermOrdValComparator competitive iterator so that it forces sparse
6262
field iteration to be at least scoring window baseline when doing intoBitSet. (Ben Trent, Adrien Grand)
6363

64+
* GITHUB#14445: Provide better impacts for fields indexed with IndexOptions.DOCS GITHUB#14511 (Aniketh Jain)
65+
6466
* GITHUB#14543: Fixed lead cost computations for bulk scorers of conjunctive
6567
queries that mix MUST and FILTER clauses, and disjunctive queries that
6668
configure a minimum number of matching SHOULD clauses.

lucene/core/src/java/org/apache/lucene/codecs/lucene101/Lucene101PostingsReader.java

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,10 @@ public final class Lucene101PostingsReader extends PostingsReaderBase {
7373
private static final List<Impact> DUMMY_IMPACTS =
7474
Collections.singletonList(new Impact(Integer.MAX_VALUE, 1L));
7575

76+
// We stopped storing a placeholder impact with freq=1 for fields with DOCS after 9.12.0
77+
private static final List<Impact> DUMMY_IMPACTS_NO_FREQS =
78+
Collections.singletonList(new Impact(1, 1L));
79+
7680
private final IndexInput docIn;
7781
private final IndexInput posIn;
7882
private final IndexInput payIn;
@@ -1328,13 +1332,14 @@ public int getDocIdUpTo(int level) {
13281332

13291333
@Override
13301334
public List<Impact> getImpacts(int level) {
1331-
if (indexHasFreq) {
1332-
if (level == 0 && level0LastDocID != NO_MORE_DOCS) {
1333-
return readImpacts(level0SerializedImpacts, level0Impacts);
1334-
}
1335-
if (level == 1) {
1336-
return readImpacts(level1SerializedImpacts, level1Impacts);
1337-
}
1335+
if (indexHasFreq == false) {
1336+
return DUMMY_IMPACTS_NO_FREQS;
1337+
}
1338+
if (level == 0 && level0LastDocID != NO_MORE_DOCS) {
1339+
return readImpacts(level0SerializedImpacts, level0Impacts);
1340+
}
1341+
if (level == 1) {
1342+
return readImpacts(level1SerializedImpacts, level1Impacts);
13381343
}
13391344
return DUMMY_IMPACTS;
13401345
}

0 commit comments

Comments
 (0)