Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import org.elasticsearch.compute.data.Block;
import org.elasticsearch.index.IndexMode;
import org.elasticsearch.logging.Logger;
import org.elasticsearch.transport.RemoteClusterAware;
import org.elasticsearch.xpack.core.enrich.EnrichPolicy;
import org.elasticsearch.xpack.esql.Column;
import org.elasticsearch.xpack.esql.EsqlIllegalArgumentException;
Expand Down Expand Up @@ -653,6 +654,28 @@ private Join resolveLookupJoin(LookupJoin join) {
return join;
}

// joining with remote cluster is not supported yet
if (join.left() instanceof EsRelation esr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too specific. I think it will not work in case of queries like

FROM remote:idx | EVAL x = 2*some_field | LOOKUP JOIN lu_idx ON key

Because the join's left child will not be an EsRelation. The EsRelation can be many levels down in the tree.

I'd approach this differently:

  • We only need to check the plans leaves, and only those with class EsRelation.
  • For all such relations, if any of them has IndexMode.LOOKUP, a lookup join happens.
  • Then, all EsRelations mustn't be remote. We can just iterate over all EsRelations and check their concrete indices.

Since my suggestion is a more global approach, I'd place this in the Verifier as a separate verification step, rather than here in the Analyzer. Reason: there can be multiple LOOKUP JOINs and we'll be re-checking the EsRelation in the FROM clause every time.

We have tools to easily traverse query plans, collect nodes that are instanceof a certain class, and I think also getting only the plan's leaves. Take a look at the transformDown methods and similar methods in the same class. You can also take a peek at ReplaceMissingFieldWithNull which needs to scan the plan for lookups as well.

for (var index : esr.index().concreteIndices()) {
if (index.indexOf(RemoteClusterAware.REMOTE_CLUSTER_INDEX_SEPARATOR) != -1) {
Copy link
Contributor

@bpintea bpintea Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd check both left() and right(), just to avoid the grammar being "fixed" and opening this possibility.
Also, there's StringUtils#splitQualifiedIndex() StringUtils#isQualified() that helps with this check, but arguably heavier and maybe not absolutely required.
Both notes as FYI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the right part of the join can be checked in LogicalPlanBuilder where the lookup is built.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current state, I think I'd prefer adding a check right here like Bogdan suggests. But, we could have both for maximum paranoia!

  • Check in LogicalPlanBuilder: currently redundant, as the grammar doesn't allow for : at all. I'll likely change my mind once we update the grammar. Because if we generalize the latter, a check in LogicalPlanBuilder would require only the parsed query; as such, it can prevent even sending a field caps request in case a user asks for ... | LOOKUP JOIN remote:lu_idx ....
  • Check here, in the Analyzer: using the concrete indices is a good source of definitive truth, as the concrete indices cannot contain aliases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not about paranoia, but about performance: fail fast without even checking the field_caps call(s) or resolve the clusters/indices and then fail in the Analyzer.

This needs a bit of research... there may be situations (at the Verifier level) where the behavior might change if we check things too early. For example, lookup join some_inexistent_remote_cluster:foo - do we complain that a remote lookup index is not supported or that some_inexistent_remote_cluster doesn't actually exist and the query is incorrect because maybe the user had a typo?

return join.withConfig(
new JoinConfig(
type,
List.of(
new UnresolvedAttribute(
join.source(),
"unsupported",
"LOOKUP JOIN does not support joining with remote cluster indices [" + esr.index().name() + "]"
)
),
List.of(),
List.of()
)
);
}
}
}

JoinType coreJoin = using.coreJoin();
// verify the join type
if (coreJoin != JoinTypes.LEFT) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2250,6 +2250,24 @@ public void testLookupJoinIndexMode() {
assertThat(e.getMessage(), containsString("1:25: invalid [test] resolution in lookup mode to an index in [standard] mode"));
}

public void testLookupJoinRemoteClusterUnsupported() {
assumeTrue("requires LOOKUP JOIN capability", EsqlCapabilities.Cap.JOIN_LOOKUP_V11.isEnabled());

var remoteIndexResolution = loadMapping("mapping-default.json", "remote:my-index");
var lookupResolution = AnalyzerTestUtils.defaultLookupResolution();
VerificationException e = expectThrows(
VerificationException.class,
() -> analyze(
"FROM remote:my-index | LOOKUP JOIN languages_lookup ON language_code",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add test cases with multiple lookups, and test cases where more stuff happens between the FROM and the LOOKUP JOIN.

AnalyzerTestUtils.analyzer(remoteIndexResolution, lookupResolution)
)
);
assertThat(
e.getMessage(),
containsString("1:24: LOOKUP JOIN does not support joining with remote cluster indices [remote:my-index]")
);
}

public void testImplicitCasting() {
var e = expectThrows(VerificationException.class, () -> analyze("""
from test | eval x = concat("2024", "-04", "-01") + 1 day
Expand Down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more test case suggestion: comma-separated list of index patterns, some of which are remote. E.g. FROM local,remote:pattern*,... with and without quoting.

Original file line number Diff line number Diff line change
Expand Up @@ -298,18 +298,18 @@ public void testStatsWithoutGroups() {
);
}

public void testStatsWithoutAggs() throws Exception {
public void testStatsWithoutAggs() {
assertEquals(
new Aggregate(EMPTY, PROCESSING_CMD_INPUT, Aggregate.AggregateType.STANDARD, List.of(attribute("a")), List.of(attribute("a"))),
processingCommand("stats by a")
);
}

public void testStatsWithoutAggsOrGroup() throws Exception {
public void testStatsWithoutAggsOrGroup() {
expectError("from text | stats", "At least one aggregation or grouping expression required in [stats]");
}

public void testAggsWithGroupKeyAsAgg() throws Exception {
public void testAggsWithGroupKeyAsAgg() {
var queries = new String[] { """
row a = 1, b = 2
| stats a by a
Expand All @@ -330,7 +330,7 @@ public void testAggsWithGroupKeyAsAgg() throws Exception {
}
}

public void testStatsWithGroupKeyAndAggFilter() throws Exception {
public void testStatsWithGroupKeyAndAggFilter() {
var a = attribute("a");
var f = new UnresolvedFunction(EMPTY, "min", DEFAULT, List.of(a));
var filter = new Alias(EMPTY, "min(a) where a > 1", new FilteredExpression(EMPTY, f, new GreaterThan(EMPTY, a, integer(1))));
Expand All @@ -340,7 +340,7 @@ public void testStatsWithGroupKeyAndAggFilter() throws Exception {
);
}

public void testStatsWithGroupKeyAndMixedAggAndFilter() throws Exception {
public void testStatsWithGroupKeyAndMixedAggAndFilter() {
var a = attribute("a");
var min = new UnresolvedFunction(EMPTY, "min", DEFAULT, List.of(a));
var max = new UnresolvedFunction(EMPTY, "max", DEFAULT, List.of(a));
Expand Down Expand Up @@ -375,7 +375,7 @@ public void testStatsWithGroupKeyAndMixedAggAndFilter() throws Exception {
);
}

public void testStatsWithoutGroupKeyMixedAggAndFilter() throws Exception {
public void testStatsWithoutGroupKeyMixedAggAndFilter() {
var a = attribute("a");
var f = new UnresolvedFunction(EMPTY, "min", DEFAULT, List.of(a));
var filter = new Alias(EMPTY, "min(a) where a > 1", new FilteredExpression(EMPTY, f, new GreaterThan(EMPTY, a, integer(1))));
Expand Down Expand Up @@ -2058,41 +2058,41 @@ private void assertStringAsLookupIndexPattern(String string, String statement) {
assertThat(tableName.fold(FoldContext.small()), equalTo(string));
}

public void testIdPatternUnquoted() throws Exception {
public void testIdPatternUnquoted() {
var string = "regularString";
assertThat(breakIntoFragments(string), contains(string));
}

public void testIdPatternQuoted() throws Exception {
public void testIdPatternQuoted() {
var string = "`escaped string`";
assertThat(breakIntoFragments(string), contains(string));
}

public void testIdPatternQuotedWithDoubleBackticks() throws Exception {
public void testIdPatternQuotedWithDoubleBackticks() {
var string = "`escaped``string`";
assertThat(breakIntoFragments(string), contains(string));
}

public void testIdPatternUnquotedAndQuoted() throws Exception {
public void testIdPatternUnquotedAndQuoted() {
var string = "this`is`a`mix`of`ids`";
assertThat(breakIntoFragments(string), contains("this", "`is`", "a", "`mix`", "of", "`ids`"));
}

public void testIdPatternQuotedTraling() throws Exception {
public void testIdPatternQuotedTraling() {
var string = "`foo`*";
assertThat(breakIntoFragments(string), contains("`foo`", "*"));
}

public void testIdPatternWithDoubleQuotedStrings() throws Exception {
public void testIdPatternWithDoubleQuotedStrings() {
var string = "`this``is`a`quoted `` string``with`backticks";
assertThat(breakIntoFragments(string), contains("`this``is`", "a", "`quoted `` string``with`", "backticks"));
}

public void testSpaceNotAllowedInIdPattern() throws Exception {
public void testSpaceNotAllowedInIdPattern() {
expectError("ROW a = 1| RENAME a AS this is `not okay`", "mismatched input 'is' expecting {<EOF>, '|', ',', '.'}");
}

public void testSpaceNotAllowedInIdPatternKeep() throws Exception {
public void testSpaceNotAllowedInIdPatternKeep() {
expectError("ROW a = 1, b = 1| KEEP a b", "extraneous input 'b'");
}

Expand Down Expand Up @@ -2358,4 +2358,9 @@ public void testFailingMetadataWithSquareBrackets() {
"line 1:11: mismatched input '[' expecting {<EOF>, '|', ',', 'metadata'}"
);
}

public void testInvalidRemoteLookupJoin() {
// TODO ES-10559 this should be replaced with a proper error message once grammar allows indexPattern as joinTarget
expectError("FROM my-index | LOOKUP JOIN remote:languages_lookup ON language_code", "line 1:35: token recognition error at: ':'");
Copy link
Contributor

@alex-spies alex-spies Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to say that we should already start adding cases with quoting here, but realized that's kinda silly due to the grammar just not being there yet.

But maybe it makes sense to consider the associated issue blocked by ES-10559, just because we'll need to add more tests once the grammar is generalized. (In a follow-up PR)

I can think of

  • LOOKUP JOIN remote:"languages_lookup"
  • LOOKUP JOIN "remote:languages_lookup"
  • LOOKUP JOIN """remote:languages_lookup"""
  • LOOKUP JOIN remote:"""languages_lookup"""

and I'm unsure if all of these actually make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am likely going to merge this after #120494 (that is changing the same classes). I will add above variations once resolving the conflicts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since #120494 adds nice options for randomization, we could also use randomization here.

Other test cases we could consider:

  • multiple colons
  • colons and wildcards
  • colons and date math - especially because we want to enable date math later, and valid date math expressions themselves can contain colons
  • quoting (" and """) of the remote name + 1 or multiple colons - just so we can later see how the error message changes from a token recognition error to a proper parsing exception once Ensure cluster string could be quoted #120355 lands.

}
}