-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Ensure partial aggregation outputs match layout #135813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e2bfc00 to
591ca3f
Compare
591ca3f to
3221cc6
Compare
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that we don't just re-compute the intermediate attributes anymore, which was confusing. Thanks, Nhat!
| if (Assertions.ENABLED) { | ||
| List<Attribute> inputAttributes = exchangeSink.child().output(); | ||
| for (Attribute attr : inputAttributes) { | ||
| assert source.layout.get(attr.id()) != null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe throw ISE instead, so that messing up an agg will not kill the node during a full run of the test suite. That makes for easier-to-triage CI issues opened by the CI bot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be an assertion to verify the invariant. I opened #135862 to handle the spec tests when the test cluster is broken.
| if (Assertions.ENABLED) { | ||
| List<Attribute> inputAttributes = exchangeSink.child().output(); | ||
| for (Attribute attr : inputAttributes) { | ||
| assert source.layout.get(attr.id()) != null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added check is very reasonable. Maybe it makes sense to add it to the general plan method, as it's an invariant that's not only required for planning exchanges, but for every plan node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ I tried to add this, but this invariant doesn't hold for ExchangeSinkExec and ExchangeSourceExec. I will address this in a follow-up.
| List<Aggregator.Factory> aggregatorFactories = new ArrayList<>(); | ||
|
|
||
| // append channels to the layout | ||
| if (aggregatorMode.isOutputPartial()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, AggregateExec#output has its own isOutputPartial check. I think it is probably correct to remove this if-else and always call layout.append(aggregateExec.output()); the only thing AggregateExec#output does differently in the non-partial case is de-duplicating based on name, but we shouldn't have duplicates to begin with. And if we have, we shouldn't be adding them to the output layout if they're not in the agg's output to begin with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can use aggregateExec.output() - I pushed c5ab8d8
|
Thanks Alex! |
Currently, LocalExecutionPlanner re-generates the intermediate output of a partial aggregation, leading to its intermediate outputs don't match the layout - see the new assertions added to the exchange sink node. This hasn't been an issue because:
For exchange sink, we don't check the layout; we just pass the page to the exchange buffer.
Partial/final aggregations work due to the
cachein AggregateMapper.This becomes a problem when I try to add field extraction after partial aggregation on data nodes in time-series. With this change, we create a layout that matches the output attributes of partial aggregation. This change also removes the cache in AggregateMapper, which was confusing.