-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Speed up sorts on secondary sort fields #137533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… with the recorded bottom. When construction a new CompetitiveDISIBuilder, then check whether global min/max points or global min/max doc values skipper are comparative with the bottom. If so, then update competitiveIterator with an empty iterator, because no documents will have a value that is competitive with the current recorded bottom in the current segment. Doing this at CompetitiveDISIBuilder construction is cheap and allows to immediately prune, instead of waiting until doUpdateCompetitiveIterator(...) is invoked.
…g/sort-still-slow
|
Pinging @elastic/es-search (Team:Search) |
|
Hi @romseygeek, I've created a changelog YAML for you. |
…g/sort-still-slow
…g/sort-still-slow
…g/sort-still-slow
|
Buildkite benchmark this with elastic/logs please |
|
Buildkite benchmark this with elastic-logs please |
💚 Build Succeeded
This build ran two elastic-logs benchmarks to evaluate performance impact of this PR. Historycc @romseygeek |
martijnvg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work 👍 . I left a few comments, but otherwise LGTM.
| * range, using DocValueSkippers on the primary sort field to advance rapidly | ||
| * to the next block of values. | ||
| */ | ||
| class SecondarySortIterator extends DocIdSetIterator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a unit test for this iterator? Maybe we can do a test duel against DocValuesRangeIterator wrapped in an interator?
| } | ||
| DocValuesSkipper skipper = context.reader().getDocValuesSkipper(field); | ||
| DocValuesSkipper primaryFieldSkipper = context.reader().getDocValuesSkipper(sortFields[0].getField()); | ||
| if (primaryFieldSkipper == null || skipper.docCount() != maxDoc || primaryFieldSkipper.docCount() != maxDoc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case primaryFieldSkipper is null, then the secondary sort field can be treated as primary sort and we can go faster (like in SortedNumericDocValuesRangeQuery#getDocIdSetIteratorOrNullForPrimarySort). But I don't think this happens now? Because super.buildCompetitiveDISIBuilder(context); will not detect this?
The same applies if primary sort field has just one value.
If this is true. Let's address this then in a followup?
| competitiveDISIBuilder = buildCompetitiveDISIBuilder(context); | ||
| } | ||
|
|
||
| protected CompetitiveDISIBuilder buildCompetitiveDISIBuilder(LeafReaderContext context) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bit we want to contribute to upstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly
…g/sort-still-slow
This adds a competitive iterator implementation that will take advantage of doc value skippers in the case that: * the index is sorted by a low cardinality field like hostname, and then by a high cardinality field like timestamp * skippers are enabled on both of these fields * the query is sorted by the high cardinality field. To be able to plug this new implementation into the lucene sort architecture, we need to fork NumericComparator and some associated classes. LongValuesComparatorSource now returns the forked version with the new competitive iterator builder.
This adds a competitive iterator implementation that will take advantage of doc
value skippers in the case that:
field like timestamp
To be able to plug this new implementation into the lucene sort architecture, we need
to fork NumericComparator and some associated classes. LongValuesComparatorSource
now returns the forked version with the new competitive iterator builder.