-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Don't duplicate Limit on left InlineJoin #133651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Don't duplicate Limit on left InlineJoin #133651
Conversation
...as this reduces the data the (inline) aggs run on, which is wrong.
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @bpintea !
| } | ||
| } | ||
| } else if (limit.child() instanceof Join join && join.config().type() == JoinTypes.LEFT) { | ||
| } else if (limit.child() instanceof Join join && join.config().type() == JoinTypes.LEFT && join instanceof InlineJoin == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++.
Maybe let's add a comment that we want to also apply this to InlineJoin, but that requires that the stub relation corresponds to a node somewhere inside the left branch, rather than the whole left branch. Or we need to get rid of the stub relation then and replace it by a copy of the left branch (with updated name ids, most likely, but that's a detail).
| FROM employees | ||
| | KEEP emp_no, languages, gender | ||
| | INLINESTATS max_lang = MAX(languages) BY gender | ||
| | LIMIT 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limit 1 is non-deterministic and this might fail on multi-node tests like for serverless, no?
Maybe LIMIT 3 | SORT gender (-> F, M, null) as this would still surface the bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first (emp_no-sorted) value has a language value of 2. If the LIMIT (of 1) is pushed down, that would yield a max_lang of 2. Ideally we'd have a | SORT emp_no before INLINESTATS, but that doesn't current work.
In any case, even if the data is shuffled, all records should have the same max_lang (5).
I've left a comment on the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| } | ||
| } | ||
| } else if (limit.child() instanceof Join join && join.config().type() == JoinTypes.LEFT) { | ||
| } else if (limit.child() instanceof Join join && join.config().type() == JoinTypes.LEFT && join instanceof InlineJoin == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A one line comment here would help, imho.
|
Thanks Alex, Andrei. |
...as this reduces the data the (inline) aggs run on, which is wrong. Closes elastic#133235
...as this reduces the data the (inline) aggs run on, which is wrong.
Closes #133235