Skip to content

Commit 1b2451b

Browse files
authored
Provide better impacts for fields indexed with IndexOptions.DOCS (#14558)
Postings always return impacts with freq=Integer.MAX_VALUE and norm=1 when frequencies are not indexed (IndexOptions.DOCS). This significantly overestimates the score upper bound of term queries, since the similarity scorer is effectively called with freq=1 all the time in this case (and either norm=1 if norms are not indexed, or the number of terms in the field otherwise). This updates postings to always return impacts with freq=1 and norm=1 when frequencies are not indexed, which helps compute better score upper bounds, and in-turn makes dynamic pruning perform better. Closes #14445
1 parent 1a5a767 commit 1b2451b

File tree

2 files changed

+14
-7
lines changed

2 files changed

+14
-7
lines changed

lucene/CHANGES.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ Bug Fixes
1414
* GITHUB#14523, GITHUB#14530: Correct TermOrdValComparator competitive iterator so that it forces sparse
1515
field iteration to be at least scoring window baseline when doing intoBitSet. (Ben Trent, Adrien Grand)
1616

17+
* GITHUB#14445: Provide better impacts for fields indexed with IndexOptions.DOCS GITHUB#14511 (Aniketh Jain)
18+
1719
* GITHUB#14543: Fixed lead cost computations for bulk scorers of conjunctive
1820
queries that mix MUST and FILTER clauses, and disjunctive queries that
1921
configure a minimum number of matching SHOULD clauses.

lucene/core/src/java/org/apache/lucene/codecs/lucene101/Lucene101PostingsReader.java

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,10 @@ public final class Lucene101PostingsReader extends PostingsReaderBase {
7373
private static final List<Impact> DUMMY_IMPACTS =
7474
Collections.singletonList(new Impact(Integer.MAX_VALUE, 1L));
7575

76+
// We stopped storing a placeholder impact with freq=1 for fields with DOCS after 9.12.0
77+
private static final List<Impact> DUMMY_IMPACTS_NO_FREQS =
78+
Collections.singletonList(new Impact(1, 1L));
79+
7680
private final IndexInput docIn;
7781
private final IndexInput posIn;
7882
private final IndexInput payIn;
@@ -1328,13 +1332,14 @@ public int getDocIdUpTo(int level) {
13281332

13291333
@Override
13301334
public List<Impact> getImpacts(int level) {
1331-
if (indexHasFreq) {
1332-
if (level == 0 && level0LastDocID != NO_MORE_DOCS) {
1333-
return readImpacts(level0SerializedImpacts, level0Impacts);
1334-
}
1335-
if (level == 1) {
1336-
return readImpacts(level1SerializedImpacts, level1Impacts);
1337-
}
1335+
if (indexHasFreq == false) {
1336+
return DUMMY_IMPACTS_NO_FREQS;
1337+
}
1338+
if (level == 0 && level0LastDocID != NO_MORE_DOCS) {
1339+
return readImpacts(level0SerializedImpacts, level0Impacts);
1340+
}
1341+
if (level == 1) {
1342+
return readImpacts(level1SerializedImpacts, level1Impacts);
13381343
}
13391344
return DUMMY_IMPACTS;
13401345
}

0 commit comments

Comments
 (0)