-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Ban Limit + MvExpand before remote Enrich #135051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
9edcd72
7c6c9b1
3f2e2e9
3d909ce
f6ee41a
3029092
e3871aa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
pr: 135051 | ||
summary: Ban Limit + `MvExpand` before remote Enrich | ||
area: ES|QL | ||
type: bug | ||
issues: [] |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,7 @@ | |
import org.elasticsearch.common.lucene.BytesRefs; | ||
import org.elasticsearch.common.util.Maps; | ||
import org.elasticsearch.xpack.core.enrich.EnrichPolicy; | ||
import org.elasticsearch.xpack.esql.capabilities.PostAnalysisVerificationAware; | ||
import org.elasticsearch.xpack.esql.capabilities.PostOptimizationVerificationAware; | ||
import org.elasticsearch.xpack.esql.capabilities.TelemetryAware; | ||
import org.elasticsearch.xpack.esql.common.Failures; | ||
|
@@ -48,6 +49,7 @@ public class Enrich extends UnaryPlan | |
implements | ||
GeneratingPlan<Enrich>, | ||
PostOptimizationVerificationAware, | ||
PostAnalysisVerificationAware, | ||
TelemetryAware, | ||
SortAgnostic, | ||
ExecutesOn { | ||
|
@@ -284,6 +286,36 @@ private void checkForPlansForbiddenBeforeRemoteEnrich(Failures failures) { | |
fails.forEach(f -> failures.add(fail(this, "ENRICH with remote policy can't be executed after [" + f.text() + "]" + f.source()))); | ||
} | ||
|
||
/** | ||
* Remote ENRICH (and any remote operation in fact) is not compatible with MV_EXPAND + LIMIT. Consider: | ||
* `FROM *:events | SORT @timestamp | LIMIT 2 | MV_EXPAND ip | ENRICH _remote:clientip_policy ON ip` | ||
* Semantically, this must take two top events and then expand them. However, this can not be executed remotely, | ||
* because this means that we have to take top 2 events on each node, then expand them, then apply Enrich, | ||
* then bring them to the coordinator - but then we can not select top 2 of them - because that would be pre-expand! | ||
* We do not know which expanded rows are coming from the true top rows and which are coming from "false" top rows | ||
* which should have been thrown out. This is only possible to execute if MV_EXPAND executes on the coordinator | ||
* - which contradicts remote Enrich. | ||
* This could be fixed by the optimizer by moving MV_EXPAND past ENRICH, at least in some cases, but currently we do not do that. | ||
*/ | ||
private void checkMvExpandAfterLimit(Failures failures) { | ||
this.forEachDown(MvExpand.class, u -> { | ||
u.forEachDown(p -> { | ||
if (p instanceof Limit || p instanceof TopN) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I noticed that the logic for JOIN is a bit different; in particular, post optimization, it also checks for the presence of a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Join and Enrich are different, as Enrich is cardinality-preserving while Join is not. That makes some pipeline breakers compatible with Enrich but not with Join. I agree that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is also the reason for this particular change, btw - |
||
failures.add(fail(this, "MV_EXPAND after LIMIT is incompatible with remote ENRICH")); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. note: technically, that's only true if we cannot push the remote enrich past the mv_expand. Which we sometimes could! (There are more optimizations that could be applied to But since we currently don't do this, this check will strictly prohibit only queries that we can't properly run anyway, so this is fine. Maybe we could add a comment, though? |
||
} | ||
}); | ||
}); | ||
|
||
} | ||
|
||
@Override | ||
public void postAnalysisVerification(Failures failures) { | ||
if (this.mode == Mode.REMOTE) { | ||
checkMvExpandAfterLimit(failures); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. note: since this triggers after analysis, the condition There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we have to do it on analysis stage to avoid confusion with synthetic limits that are pushed down, but I wasn't sure if there's any possible way to have topN on analysis stage. |
||
} | ||
|
||
} | ||
|
||
@Override | ||
public void postOptimizationVerification(Failures failures) { | ||
if (this.mode == Mode.REMOTE) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add tests that have random nodes in between the mv expand and the enrich. Or between the limit and the mv expand (although we already have this to some extent).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have any "random harmless commands" code anywhere in the tests? Or just add a couple of fixed ones?