Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import org.elasticsearch.compute.data.Block;
import org.elasticsearch.index.IndexMode;
import org.elasticsearch.logging.Logger;
import org.elasticsearch.transport.RemoteClusterAware;
import org.elasticsearch.xpack.core.enrich.EnrichPolicy;
import org.elasticsearch.xpack.esql.Column;
import org.elasticsearch.xpack.esql.EsqlIllegalArgumentException;
Expand Down Expand Up @@ -653,6 +654,28 @@ private Join resolveLookupJoin(LookupJoin join) {
return join;
}

// joining with remote cluster is not supported yet
if (join.left() instanceof EsRelation esr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too specific. I think it will not work in case of queries like

FROM remote:idx | EVAL x = 2*some_field | LOOKUP JOIN lu_idx ON key

Because the join's left child will not be an EsRelation. The EsRelation can be many levels down in the tree.

I'd approach this differently:

  • We only need to check the plans leaves, and only those with class EsRelation.
  • For all such relations, if any of them has IndexMode.LOOKUP, a lookup join happens.
  • Then, all EsRelations mustn't be remote. We can just iterate over all EsRelations and check their concrete indices.

Since my suggestion is a more global approach, I'd place this in the Verifier as a separate verification step, rather than here in the Analyzer. Reason: there can be multiple LOOKUP JOINs and we'll be re-checking the EsRelation in the FROM clause every time.

We have tools to easily traverse query plans, collect nodes that are instanceof a certain class, and I think also getting only the plan's leaves. Take a look at the transformDown methods and similar methods in the same class. You can also take a peek at ReplaceMissingFieldWithNull which needs to scan the plan for lookups as well.

for (var index : esr.index().concreteIndices()) {
if (index.indexOf(RemoteClusterAware.REMOTE_CLUSTER_INDEX_SEPARATOR) != -1) {
Copy link
Contributor

@bpintea bpintea Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd check both left() and right(), just to avoid the grammar being "fixed" and opening this possibility.
Also, there's StringUtils#splitQualifiedIndex() StringUtils#isQualified() that helps with this check, but arguably heavier and maybe not absolutely required.
Both notes as FYI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the right part of the join can be checked in LogicalPlanBuilder where the lookup is built.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current state, I think I'd prefer adding a check right here like Bogdan suggests. But, we could have both for maximum paranoia!

  • Check in LogicalPlanBuilder: currently redundant, as the grammar doesn't allow for : at all. I'll likely change my mind once we update the grammar. Because if we generalize the latter, a check in LogicalPlanBuilder would require only the parsed query; as such, it can prevent even sending a field caps request in case a user asks for ... | LOOKUP JOIN remote:lu_idx ....
  • Check here, in the Analyzer: using the concrete indices is a good source of definitive truth, as the concrete indices cannot contain aliases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not about paranoia, but about performance: fail fast without even checking the field_caps call(s) or resolve the clusters/indices and then fail in the Analyzer.

This needs a bit of research... there may be situations (at the Verifier level) where the behavior might change if we check things too early. For example, lookup join some_inexistent_remote_cluster:foo - do we complain that a remote lookup index is not supported or that some_inexistent_remote_cluster doesn't actually exist and the query is incorrect because maybe the user had a typo?

return join.withConfig(
new JoinConfig(
type,
List.of(
new UnresolvedAttribute(
join.source(),
"unsupported",
"LOOKUP JOIN does not support joining with remote cluster indices [" + esr.index().name() + "]"
)
),
List.of(),
List.of()
)
);
}
}
}

JoinType coreJoin = using.coreJoin();
// verify the join type
if (coreJoin != JoinTypes.LEFT) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2250,6 +2250,24 @@ public void testLookupJoinIndexMode() {
assertThat(e.getMessage(), containsString("1:25: invalid [test] resolution in lookup mode to an index in [standard] mode"));
}

public void testLookupJoinRemoteClusterUnsupported() {
assumeTrue("requires LOOKUP JOIN capability", EsqlCapabilities.Cap.JOIN_LOOKUP_V11.isEnabled());

var remoteIndexResolution = loadMapping("mapping-default.json", "remote:my-index");
var lookupResolution = AnalyzerTestUtils.defaultLookupResolution();
VerificationException e = expectThrows(
VerificationException.class,
() -> analyze(
"FROM remote:my-index | LOOKUP JOIN languages_lookup ON language_code",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add test cases with multiple lookups, and test cases where more stuff happens between the FROM and the LOOKUP JOIN.

AnalyzerTestUtils.analyzer(remoteIndexResolution, lookupResolution)
)
);
assertThat(
e.getMessage(),
containsString("1:24: LOOKUP JOIN does not support joining with remote cluster indices [remote:my-index]")
);
}

public void testImplicitCasting() {
var e = expectThrows(VerificationException.class, () -> analyze("""
from test | eval x = concat("2024", "-04", "-01") + 1 day
Expand Down