- 
                Notifications
    You must be signed in to change notification settings 
- Fork 25.6k
ES|QL: Pass fix size instead of maxPageSize to LuceneTopNOperator scorer #135767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Pinging @elastic/es-search-relevance (Team:Search Relevance) | 
| } | ||
| var leafCollector = perShardCollector.getLeafCollector(scorer.leafReaderContext()); | ||
| scorer.scoreNextRange(leafCollector, scorer.leafReaderContext().reader().getLiveDocs(), maxPageSize); | ||
| scorer.scoreNextRange(leafCollector, scorer.leafReaderContext().reader().getLiveDocs(), NUM_DOCS_INTERVAL); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to lock this to the max of the range?
It looks like CancellableBulkScorer makes this bigger and bigger with time. But I think this is good and we can get it in and iterate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do something similar here, at least we try to avoid overflows:
Lines 241 to 246 in 7cbc236
| void scoreNextRange(LeafCollector collector, Bits acceptDocs, int numDocs) throws IOException { | |
| assert isDone() == false : "scorer is exhausted"; | |
| // avoid overflow and limit the range | |
| numDocs = Math.min(maxPosition - position, numDocs); | |
| assert numDocs > 0 : "scorer was exhausted"; | |
| position = bulkScorer.score(collector, acceptDocs, position, Math.min(maxPosition, position + numDocs)); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but consider labeling with >bug and creating a release note entry as this would be impacting serverless performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome finding! 🚤
| Hi @ioanatia, I've created a changelog YAML for you. | 
| } | ||
| var leafCollector = perShardCollector.getLeafCollector(scorer.leafReaderContext()); | ||
| scorer.scoreNextRange(leafCollector, scorer.leafReaderContext().reader().getLiveDocs(), maxPageSize); | ||
| scorer.scoreNextRange(leafCollector, scorer.leafReaderContext().reader().getLiveDocs(), NUM_DOCS_INTERVAL); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. got it.
| 💚 Backport successful
 | 
Initially we passed
scorer.leafReaderContext().reader().maxDoc()to the bulk scorer, which showed great improvements.However this will also keep the current thread busy until it is able to score all the docs from the current reader.
There are ways we can improve this, but they would require more changes.
For now I wanted to push a quick fix to mitigate some of the regressions we see, with the
maxPageSizebecoming smaller after #108412 .We should still see a significant performance boost just by scoring docs in batches of 4096 docs, instead of
maxPageSize.