Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/135051.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 135051
summary: Ban Limit + `MvExpand` before remote Enrich
area: ES|QL
type: bug
issues: []
Original file line number Diff line number Diff line change
Expand Up @@ -466,6 +466,20 @@ public void testEnrichCoordinatorThenEnrichRemote() {
assertThat(error.getMessage(), containsString("ENRICH with remote policy can't be executed after [ENRICH _COORDINATOR"));
}

public void testEnrichAfterMvExpandLimit() {
String query = String.format(Locale.ROOT, """
FROM *:events,events
| SORT timestamp
| LIMIT 2
| eval ip= TO_STR(host)
| MV_EXPAND host
| WHERE ip != ""
| %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add tests that have random nodes in between the mv expand and the enrich. Or between the limit and the mv expand (although we already have this to some extent).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any "random harmless commands" code anywhere in the tests? Or just add a couple of fixed ones?

""", enrichHosts(Enrich.Mode.REMOTE));
var error = expectThrows(VerificationException.class, () -> runQuery(query, randomBoolean()).close());
assertThat(error.getMessage(), containsString("MV_EXPAND after LIMIT is incompatible with remote ENRICH"));
}

private static void assertCCSExecutionInfoDetails(EsqlExecutionInfo executionInfo) {
assertThat(executionInfo.overallTook().millis(), greaterThanOrEqualTo(0L));
assertTrue(executionInfo.isCrossClusterSearch());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
import org.elasticsearch.common.lucene.BytesRefs;
import org.elasticsearch.common.util.Maps;
import org.elasticsearch.xpack.core.enrich.EnrichPolicy;
import org.elasticsearch.xpack.esql.capabilities.PostAnalysisVerificationAware;
import org.elasticsearch.xpack.esql.capabilities.PostOptimizationVerificationAware;
import org.elasticsearch.xpack.esql.capabilities.TelemetryAware;
import org.elasticsearch.xpack.esql.common.Failures;
Expand Down Expand Up @@ -48,6 +49,7 @@ public class Enrich extends UnaryPlan
implements
GeneratingPlan<Enrich>,
PostOptimizationVerificationAware,
PostAnalysisVerificationAware,
TelemetryAware,
SortAgnostic,
ExecutesOn {
Expand Down Expand Up @@ -284,6 +286,36 @@ private void checkForPlansForbiddenBeforeRemoteEnrich(Failures failures) {
fails.forEach(f -> failures.add(fail(this, "ENRICH with remote policy can't be executed after [" + f.text() + "]" + f.source())));
}

/**
* Remote ENRICH (and any remote operation in fact) is not compatible with MV_EXPAND + LIMIT. Consider:
* `FROM *:events | SORT @timestamp | LIMIT 2 | MV_EXPAND ip | ENRICH _remote:clientip_policy ON ip`
* Semantically, this must take two top events and then expand them. However, this can not be executed remotely,
* because this means that we have to take top 2 events on each node, then expand them, then apply Enrich,
* then bring them to the coordinator - but then we can not select top 2 of them - because that would be pre-expand!
* We do not know which expanded rows are coming from the true top rows and which are coming from "false" top rows
* which should have been thrown out. This is only possible to execute if MV_EXPAND executes on the coordinator
* - which contradicts remote Enrich.
* This could be fixed by the optimizer by moving MV_EXPAND past ENRICH, at least in some cases, but currently we do not do that.
*/
private void checkMvExpandAfterLimit(Failures failures) {
this.forEachDown(MvExpand.class, u -> {
u.forEachDown(p -> {
if (p instanceof Limit || p instanceof TopN) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the logic for JOIN is a bit different; in particular, post optimization, it also checks for the presence of a PipelineBreaker, while ENRICH only checks for ExecutesOn.
Do you think it makes sense to unify the two, or at least to make these two checks consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Join and Enrich are different, as Enrich is cardinality-preserving while Join is not. That makes some pipeline breakers compatible with Enrich but not with Join. I agree that PipelineBreaker usage is not ideal there are it's not exactly meant for that, and in the future we may change that to refine the meanings of each, but Enrich and Join will probably stay different. Unless we move to handling them with subplans which would resolve the cardinality problem (not for free, of course). For now I think PipelineBreaker is a good stand-in for what we need, but longer term it probably will need to be changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also the reason for this particular change, btw - MV_EXPAND changes cardinality, which leads Enrich to essentially have the same issue that remote JOIN has from the start - the order of LIMIT and cardinality-changing operation comes out wrong, as semantically we expect that the LIMIT is global over all the dataset, but in reality we only do it per-node and delay the global one until we're back at the coordinator. This only works if none of the operations in between is cardinality-changing.

failures.add(fail(this, "MV_EXPAND after LIMIT is incompatible with remote ENRICH"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: technically, that's only true if we cannot push the remote enrich past the mv_expand. Which we sometimes could! (There are more optimizations that could be applied to MV_EXPAND, in general.)

But since we currently don't do this, this check will strictly prohibit only queries that we can't properly run anyway, so this is fine.

Maybe we could add a comment, though?

}
});
});

}

@Override
public void postAnalysisVerification(Failures failures) {
if (this.mode == Mode.REMOTE) {
checkMvExpandAfterLimit(failures);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: since this triggers after analysis, the condition p instanceof TopN (while correct) will never be true - we don't create TopN nodes during analysis, only OrderBy nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have to do it on analysis stage to avoid confusion with synthetic limits that are pushed down, but I wasn't sure if there's any possible way to have topN on analysis stage.

}

}

@Override
public void postOptimizationVerification(Failures failures) {
if (this.mode == Mode.REMOTE) {
Expand Down