-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Provide better impacts for fields indexed with IndexOptions.DOCS #14511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java
Outdated
Show resolved
Hide resolved
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for only seeing now, I left some suggestions as to how to update your change.
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java
Outdated
Show resolved
Hide resolved
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Can you add a CHANGES entry under 10.3 and undo the new line in SlowImpactsEnum?
|
Addressed comments. |
|
This sounds safe enough for 10.2.1 for me. Can you move the CHANGES entry to 10.2.1 then? cc @ChrisHegarty |
|
I hope you don't mind, I updated this PR title and description to better reflect the change. |
Not at all. Thanks for taking the time to explain the different pieces of this code. It was really fun debugging this and would definitely love to visit this part of the code again. |
|
@ChrisHegarty @jpountz Moved the change log to 10.2.1 |
Eh! I think you moved it to 10.2.0, rather than 10.2.1. |
Signed-off-by: expani <[email protected]>
Signed-off-by: expani <[email protected]>
Signed-off-by: expani <[email protected]>
Signed-off-by: expani <[email protected]>
Signed-off-by: expani <[email protected]>
Signed-off-by: expani <[email protected]>
Signed-off-by: expani <[email protected]>
Signed-off-by: expani <[email protected]>
Signed-off-by: expani <[email protected]>
|
Oops hadn't rebased with main. Fixed it now. |
What am I missing? This is not applicable to 10.2.1, since the only changed file is Lucene103PostingsReader.java which is not present in 10.2 ! Did the rebase mess something up ? |
Signed-off-by: expani <[email protected]>
Signed-off-by: expani <[email protected]>
I had updated 103PostingsReader as the initial plan was not to backport. Updated 101PostingsReader which is used in 10.2.1 Should I also raise against some other branch as well ? |
Signed-off-by: expani <[email protected]>
Yes, we have not backport |
|
I made the same change in |
…che#14511) Co-Authored-by: expani <[email protected]>
…che#14511) Co-Authored-by: expani <[email protected]>
…che#14511) Co-Authored-by: expani <[email protected]>
…che#14511) Co-Authored-by: expani <[email protected]>
|
@expani could you resolve the conflicts so that i can merge? |
) Postings always return impacts with freq=Integer.MAX_VALUE and norm=1 when frequencies are not indexed (IndexOptions.DOCS). This significantly overestimates the score upper bound of term queries, since the similarity scorer is effectively called with freq=1 all the time in this case (and either norm=1 if norms are not indexed, or the number of terms in the field otherwise). This updates postings to always return impacts with freq=1 and norm=1 when frequencies are not indexed, which helps compute better score upper bounds, and in-turn makes dynamic pruning perform better. Closes #14445 Co-Authored-by: expani <[email protected]>
let the IndexMode decide whether to default lucene postings instead of checking for standard index mode. The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.
Remove use_default_lucene_postings_format feature flag and let the IndexMode decide whether to default lucene postings instead of checking for standard index mode. The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.
Remove use_default_lucene_postings_format feature flag and let the IndexMode decide whether to default lucene postings instead of checking for standard index mode. The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.
Remove use_default_lucene_postings_format feature flag and let the IndexMode decide whether to default lucene postings instead of checking for standard index mode. The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.
Remove use_default_lucene_postings_format feature flag and let the IndexMode decide whether to default lucene postings instead of checking for standard index mode. The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.
Remove use_default_lucene_postings_format feature flag and let the IndexMode decide whether to default lucene postings instead of checking for standard index mode. The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.
Postings always return impacts with freq=Integer.MAX_VALUE and norm=1 when frequencies are not indexed (IndexOptions.DOCS). This significantly overestimates the score upper bound of term queries, since the similarity scorer is effectively called with freq=1 all the time in this case (and either norm=1 if norms are not indexed, or the number of terms in the field otherwise).
This updates postings to always return impacts with freq=1 and norm=1 when frequencies are not indexed, which helps compute better score upper bounds, and in-turn makes dynamic pruning perform better.
Closes #14445