-
Notifications
You must be signed in to change notification settings - Fork 25.7k
ES|QL: Make TopN aggregator use heap sort internally. #134140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
877b852 to
8371793
Compare
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
We also have the Same with elasticsearch/server/src/main/java/org/elasticsearch/search/sort/BucketedSort.java Line 187 in 1ba21c2
_search, which is the base of these classes. I'm not sure if it's worth it there though, and it may be a bit different. @nik9000? (In any case, this one doesn't have to be in this PR, I'm just commenting)
|
I think the testing is pretty good over there. I'd grab there first and, if you have time, grab the next ones. But these are more imporant I think. |
nik9000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In BucketedSortTestCase we only use up to three sorted values. It's probably worth adding a case where we use a big list of them here. Just to make sure you didn't off-by-one or round funny or something. Which I'd never catch with code review.
Done. |
IIUC such a test already exists: Do you think there is anything more I should do on this PR? |
Ah. That's good.
Yeah. That's a fine for later thing. Some of that duplication often comes from using the templates. Some amount of duplication is to monomorphize the tight loops. We tend to default to this kind of behavior in the hot loop, which this is, and we'll do microbenchmarks if we're trying to be more careful. |
Indeed, the code is a bit different, I would say more advanced (it has support for extra values which we may need to port to ES|QL's version at some point). I did local changes there and run a benchmark. whereas the local change with heap sort had: so it's very similar. The command used was: |
ivancea
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the benchies! I suppose it will be difficult to see great improvements in the benchmarks we have, as ingestion will take most of the time and result collection is done only once. Maybe a larger limit and less input data? Anyway, performance didn't suffer apparently, so I think it's nice as it is!
![]()
FWIW: Apart from the time aspect, we save on some memory as we do not have to allocate this temporary list. |
Until now, the
TopNaggregator has been copying the bucket values to the separate array and sorting them usingArrays.sortmethod call.This PR makes use of the existing heap structure to sort in-place using standard heap sort algorithm.