-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: INLINESTATS implementation with multiple LogicalPlan updates #128917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
0741bbb
cd31f72
811f5be
567d600
f834ce7
d3f2838
1e9c355
fa22575
377328c
1c745c7
cffa1d7
0882cbb
e3b5c92
bc6f8cc
910c888
2d1a1ec
bb0450c
2f2cc30
a9e2d99
4661330
0500ab6
4ed8c9b
17d5933
a88b8cf
41d2119
0cbfef9
bbe1a62
f136b79
0ba7a4a
a2527ea
c04806b
5bb4e4f
8d05f3f
f7268ad
7ffaf8c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -41,7 +41,7 @@ public LogicalPlan apply(LogicalPlan plan) { | |
| // track used references | ||
| var used = plan.outputSet().asBuilder(); | ||
| // track inlinestats' own aggregation output (right-hand side of the join) so that any other plan on the left-hand side of the | ||
| // inline join won't have it's columns pruned due to the lack of "visibility" into the right hand side output/Attributes | ||
| // inline join won't have its columns pruned due to the lack of "visibility" into the right hand side output/Attributes | ||
| var inlineJoinRightOutput = new ArrayList<Attribute>(); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm confused how this works in general. Expressions in the inline join's left side generally cannot depend on the stats output, resp. the inline join's right hand side. The dependency is converse - the IJ's right hand side depends on expressions in the left, because it will normally contain a stub. Also, LOOKUP JOIN doesn't require handling the right hand side output attributes especially, which is suspicious. Maybe I'm just not seeing it and need an example, though :) The complication that I would expect: expressions on the left may get pruned because they don't appear in The first I didn't check if this actually happens, or if we maybe descend first into the right after all. Still, probably worth a test. |
||
| Holder<Boolean> forkPresent = new Holder<>(false); | ||
|
|
||
|
|
@@ -104,6 +104,17 @@ public LogicalPlan apply(LogicalPlan plan) { | |
| p = aggregate.with(aggregate.groupings(), remaining); | ||
| } | ||
| } | ||
| } else if (p instanceof InlineJoin ij) {// TODO: InlineStats - add unit tests for this IJ removal | ||
| var remaining = removeUnused(ij.right().output(), used, inlineJoinRightOutput); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Aren't all of the attributes in |
||
| if (remaining != null) { | ||
| if (remaining.isEmpty()) { | ||
| // remove the InlineJoin altogether | ||
| p = ij.left(); | ||
| recheck = true; | ||
| } | ||
| // TODO: InlineStats - prune ONLY the unused output columns from it? In other words, don't perform more aggs | ||
| // if they will not be used anyway | ||
|
Comment on lines
+117
to
+118
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ++ |
||
| } | ||
| } else if (p instanceof Eval eval) { | ||
| var remaining = removeUnused(eval.fields(), used, inlineJoinRightOutput); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some aggs leave downstream E.g. in |
||
| // no fields, no eval | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting case, not sure if we already have this in tests:
Important about this:
Actually, properly pushed down - not duplicated like it is currently! Because
INLINESTATSimportantly keeps the row count invariant, we can do things that we can't for the more general left join or lookup join. We should follow up on this to avoid inheriting the same limitations that we have forLOOKUP JOIN.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optimized logicalplan looks like this for that query you suggested (
from employees | sort emp_no desc | inlinestats avg = avg(salary) by languages | limit 5 | keep emp_no, avg, languages, gender):which I think it looks right:
I will add a TODO in the csv-spec for this exact query so that we also have a unit test for this.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no, this doesn't look right.
The output of
INLINESTATSmust not depend on subsequent commands. The limit being pushed down and combined into a topn is wrong.I just checked out your code and ran your example query:
Now, if I change this to
LIMIT 1instead ofLIMIT 5, I getThe average salary of
femaleemployees with 4 languages must not change just because I only look at 1 such employee in subsequent commands!@astefan , I believe the limit duplication from
LOOKUP JOINs is messing things up here. Respectively, how stub relations work. The thing is, the stub relation reaaaally stands for a specific plan node upstream from the join. Optimizations that push stuff into the left hand side of an inline join are generally correct but they accidentally also alter the right hand side due to the stub mechanism. My gut feeling is that the correct modelling would have the stub relation explicitly point at a specific subplan upstream, rather then implicitly standing for the left hand side of the join.However, if we want to keep the current modeling of joins/inlinestats, I think we need to
JoinandInlineJoin, which unfortunately is a subclass that has super duper different semantics as long as we use stub relations the way we do.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw: We also push down filters into the left hand side of joins but specifically exclude inline joins because that would change the row set over which we compute the stats.