Skip to content

Conversation

@expani
Copy link
Contributor

@expani expani commented Apr 16, 2025

Postings always return impacts with freq=Integer.MAX_VALUE and norm=1 when frequencies are not indexed (IndexOptions.DOCS). This significantly overestimates the score upper bound of term queries, since the similarity scorer is effectively called with freq=1 all the time in this case (and either norm=1 if norms are not indexed, or the number of terms in the field otherwise).

This updates postings to always return impacts with freq=1 and norm=1 when frequencies are not indexed, which helps compute better score upper bounds, and in-turn makes dynamic pruning perform better.

Closes #14445

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for only seeing now, I left some suggestions as to how to update your change.

@expani expani marked this pull request as ready for review April 25, 2025 08:24
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you add a CHANGES entry under 10.3 and undo the new line in SlowImpactsEnum?

@expani
Copy link
Contributor Author

expani commented Apr 25, 2025

Addressed comments.
I want to backport these to 9.12.x and 10.2.x as well. Will open separate PRs for the same.

@jpountz
Copy link
Contributor

jpountz commented Apr 25, 2025

This sounds safe enough for 10.2.1 for me. Can you move the CHANGES entry to 10.2.1 then? cc @ChrisHegarty

@jpountz jpountz changed the title Ensuring skip list is read for fields indexed with only DOCS Provide better impacts for fields indexed with IndexOptions.DOCS Apr 25, 2025
@jpountz
Copy link
Contributor

jpountz commented Apr 25, 2025

I hope you don't mind, I updated this PR title and description to better reflect the change.

@expani
Copy link
Contributor Author

expani commented Apr 25, 2025

I hope you don't mind, I updated this PR title and description to better reflect the change.

Not at all. Thanks for taking the time to explain the different pieces of this code.

It was really fun debugging this and would definitely love to visit this part of the code again.

@expani
Copy link
Contributor Author

expani commented Apr 25, 2025

@ChrisHegarty @jpountz Moved the change log to 10.2.1

@ChrisHegarty
Copy link
Contributor

@ChrisHegarty @jpountz Moved the change log to 10.2.1

Eh! I think you moved it to 10.2.0, rather than 10.2.1.

@ChrisHegarty ChrisHegarty added this to the 10.2.1 milestone Apr 25, 2025
@expani
Copy link
Contributor Author

expani commented Apr 25, 2025

Oops hadn't rebased with main. Fixed it now.

@ChrisHegarty
Copy link
Contributor

ChrisHegarty commented Apr 25, 2025

This sounds safe enough for 10.2.1 for me. Can you move the CHANGES entry to 10.2.1 then? cc @ChrisHegarty

What am I missing? This is not applicable to 10.2.1, since the only changed file is Lucene103PostingsReader.java which is not present in 10.2 ! Did the rebase mess something up ?

@expani
Copy link
Contributor Author

expani commented Apr 25, 2025

What am I missing? This is not applicable to 10.2.1, since the only changed file is Lucene103PostingsReader.java which is not present in 10.2 ! Did the rebase mess something up ?

I had updated 103PostingsReader as the initial plan was not to backport.

Updated 101PostingsReader which is used in 10.2.1

Should I also raise against some other branch as well ?

Signed-off-by: expani <[email protected]>
@gf2121
Copy link
Contributor

gf2121 commented Apr 25, 2025

Lucene103PostingsReader.java which is not present in 10.2

Yes, we have not backport Lucene103PostingReader, see #14333 (comment). I think we will need to make the same change to Lucene101PostingReader if we want to include this in 10.2.1.

@expani
Copy link
Contributor Author

expani commented Apr 25, 2025

I made the same change in Lucene101PostingsReader
Should I be raising the PR against some other branch as well ?

@gf2121
Copy link
Contributor

gf2121 commented Apr 25, 2025

To be clear, i raised #14557 and #14558 for backporting. I plan to merge this now if no one objects.

@gf2121
Copy link
Contributor

gf2121 commented Apr 25, 2025

@expani could you resolve the conflicts so that i can merge?

@gf2121 gf2121 merged commit d05d6fe into apache:main Apr 25, 2025
2 checks passed
gf2121 added a commit that referenced this pull request Apr 25, 2025
)

Postings always return impacts with freq=Integer.MAX_VALUE and norm=1 when frequencies are not indexed (IndexOptions.DOCS). This significantly overestimates the score upper bound of term queries, since the similarity scorer is effectively called with freq=1 all the time in this case (and either norm=1 if norms are not indexed, or the number of terms in the field otherwise).

This updates postings to always return impacts with freq=1 and norm=1 when frequencies are not indexed, which helps compute better score upper bounds, and in-turn makes dynamic pruning perform better.

Closes #14445

Co-Authored-by: expani <[email protected]>
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request May 27, 2025
let the IndexMode decide whether to default lucene postings instead of checking for standard index mode.

The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.
martijnvg added a commit to elastic/elasticsearch that referenced this pull request Jun 2, 2025
Remove use_default_lucene_postings_format feature flag and
let the IndexMode decide whether to default lucene postings instead of checking for standard index mode.

The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.
mridula-s109 pushed a commit to elastic/elasticsearch that referenced this pull request Jun 2, 2025
Remove use_default_lucene_postings_format feature flag and
let the IndexMode decide whether to default lucene postings instead of checking for standard index mode.

The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jun 3, 2025
Remove use_default_lucene_postings_format feature flag and
let the IndexMode decide whether to default lucene postings instead of checking for standard index mode.

The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.
joshua-adams-1 pushed a commit to joshua-adams-1/elasticsearch that referenced this pull request Jun 3, 2025
Remove use_default_lucene_postings_format feature flag and
let the IndexMode decide whether to default lucene postings instead of checking for standard index mode.

The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.
Samiul-TheSoccerFan pushed a commit to Samiul-TheSoccerFan/elasticsearch that referenced this pull request Jun 5, 2025
Remove use_default_lucene_postings_format feature flag and
let the IndexMode decide whether to default lucene postings instead of checking for standard index mode.

The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Term Query is slower post Lucene 9.12 for fields with IndexOptions.DOCS

5 participants