Skip to content
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/120277.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 120277
summary: Disallow CCS with lookup join
area: ES|QL
type: enhancement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd declare this as non-issue and would remove the changelog. It's not really an enhancement, anyway :)

issues: []
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import org.elasticsearch.dissect.DissectException;
import org.elasticsearch.dissect.DissectParser;
import org.elasticsearch.index.IndexMode;
import org.elasticsearch.transport.RemoteClusterAware;
import org.elasticsearch.xpack.esql.VerificationException;
import org.elasticsearch.xpack.esql.common.Failure;
import org.elasticsearch.xpack.esql.core.expression.Alias;
Expand Down Expand Up @@ -523,9 +524,15 @@ public PlanFactory visitJoinCommand(EsqlBaseParser.JoinCommandContext ctx) {
}

var target = ctx.joinTarget();
var rightPattern = visitIdentifier(target.index);

if (RemoteClusterAware.isRemoteIndexName(rightPattern)) {
throw new ParsingException(source(target), "LOOKUP JOIN does not support remote cluster indices [{}]", rightPattern);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: isn't this redundant, because below the p.forEachUp(UnresolvedRelation.class will walk over the join, too, and inspect the right hand side (which is an unresolved relation), anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below p only references the left side. Possibly it should be renamed to make it a bit more obvious.
We could create a join and execute .forEachUp on it instead.
We might use current approach if we believe distinct messages make sense for left and right side.


UnresolvedRelation right = new UnresolvedRelation(
source(target),
new TableIdentifier(source(target.index), null, visitIdentifier(target.index)),
new TableIdentifier(source(target.index), null, rightPattern),
false,
emptyList(),
IndexMode.LOOKUP,
Expand All @@ -552,6 +559,18 @@ public PlanFactory visitJoinCommand(EsqlBaseParser.JoinCommandContext ctx) {
throw new ParsingException(source, "JOIN ON clause only supports one field at the moment, found [{}]", matchFieldsCount);
}

return p -> new LookupJoin(source, p, right, joinFields);
return p -> {
p.forEachUp(UnresolvedRelation.class, r -> {
if (RemoteClusterAware.isRemoteIndexName(r.table().index())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think there is a bug here: in case of multiple indices separated by commas, isRemoteIndexName is not appropriate because it treats the string as a single index name.

I think commas cannot be used inside date math expressions, and should thus be a correct separator to use to split r.table().index() back into parts. (Failing this, we'd have to go back to the step where the individual index patterns are parsed - but I don't think it should be necessary, given that we also use r.table().index() for the field caps calls.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, can we have : in date math expression? Otherwise presents of : in concatenated string should indicate remote was used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can have : in date math. And our findings also were that commas are not used in date math, so you need to split by commas and then call isRemoteIndexName.

throw new ParsingException(
source(target),
"LOOKUP JOIN does not support remote cluster indices [{}]",
r.table().index()
);
}
});

return new LookupJoin(source, p, right, joinFields);
};
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more test case suggestion: comma-separated list of index patterns, some of which are remote. E.g. FROM local,remote:pattern*,... with and without quoting.

Original file line number Diff line number Diff line change
Expand Up @@ -300,18 +300,18 @@ public void testStatsWithoutGroups() {
);
}

public void testStatsWithoutAggs() throws Exception {
public void testStatsWithoutAggs() {
assertEquals(
new Aggregate(EMPTY, PROCESSING_CMD_INPUT, Aggregate.AggregateType.STANDARD, List.of(attribute("a")), List.of(attribute("a"))),
processingCommand("stats by a")
);
}

public void testStatsWithoutAggsOrGroup() throws Exception {
public void testStatsWithoutAggsOrGroup() {
expectError("from text | stats", "At least one aggregation or grouping expression required in [stats]");
}

public void testAggsWithGroupKeyAsAgg() throws Exception {
public void testAggsWithGroupKeyAsAgg() {
var queries = new String[] { """
row a = 1, b = 2
| stats a by a
Expand All @@ -332,7 +332,7 @@ public void testAggsWithGroupKeyAsAgg() throws Exception {
}
}

public void testStatsWithGroupKeyAndAggFilter() throws Exception {
public void testStatsWithGroupKeyAndAggFilter() {
var a = attribute("a");
var f = new UnresolvedFunction(EMPTY, "min", DEFAULT, List.of(a));
var filter = new Alias(EMPTY, "min(a) where a > 1", new FilteredExpression(EMPTY, f, new GreaterThan(EMPTY, a, integer(1))));
Expand All @@ -342,7 +342,7 @@ public void testStatsWithGroupKeyAndAggFilter() throws Exception {
);
}

public void testStatsWithGroupKeyAndMixedAggAndFilter() throws Exception {
public void testStatsWithGroupKeyAndMixedAggAndFilter() {
var a = attribute("a");
var min = new UnresolvedFunction(EMPTY, "min", DEFAULT, List.of(a));
var max = new UnresolvedFunction(EMPTY, "max", DEFAULT, List.of(a));
Expand Down Expand Up @@ -377,7 +377,7 @@ public void testStatsWithGroupKeyAndMixedAggAndFilter() throws Exception {
);
}

public void testStatsWithoutGroupKeyMixedAggAndFilter() throws Exception {
public void testStatsWithoutGroupKeyMixedAggAndFilter() {
var a = attribute("a");
var f = new UnresolvedFunction(EMPTY, "min", DEFAULT, List.of(a));
var filter = new Alias(EMPTY, "min(a) where a > 1", new FilteredExpression(EMPTY, f, new GreaterThan(EMPTY, a, integer(1))));
Expand Down Expand Up @@ -2060,41 +2060,41 @@ private void assertStringAsLookupIndexPattern(String string, String statement) {
assertThat(tableName.fold(FoldContext.small()), equalTo(string));
}

public void testIdPatternUnquoted() throws Exception {
public void testIdPatternUnquoted() {
var string = "regularString";
assertThat(breakIntoFragments(string), contains(string));
}

public void testIdPatternQuoted() throws Exception {
public void testIdPatternQuoted() {
var string = "`escaped string`";
assertThat(breakIntoFragments(string), contains(string));
}

public void testIdPatternQuotedWithDoubleBackticks() throws Exception {
public void testIdPatternQuotedWithDoubleBackticks() {
var string = "`escaped``string`";
assertThat(breakIntoFragments(string), contains(string));
}

public void testIdPatternUnquotedAndQuoted() throws Exception {
public void testIdPatternUnquotedAndQuoted() {
var string = "this`is`a`mix`of`ids`";
assertThat(breakIntoFragments(string), contains("this", "`is`", "a", "`mix`", "of", "`ids`"));
}

public void testIdPatternQuotedTraling() throws Exception {
public void testIdPatternQuotedTraling() {
var string = "`foo`*";
assertThat(breakIntoFragments(string), contains("`foo`", "*"));
}

public void testIdPatternWithDoubleQuotedStrings() throws Exception {
public void testIdPatternWithDoubleQuotedStrings() {
var string = "`this``is`a`quoted `` string``with`backticks";
assertThat(breakIntoFragments(string), contains("`this``is`", "a", "`quoted `` string``with`", "backticks"));
}

public void testSpaceNotAllowedInIdPattern() throws Exception {
public void testSpaceNotAllowedInIdPattern() {
expectError("ROW a = 1| RENAME a AS this is `not okay`", "mismatched input 'is' expecting {<EOF>, '|', ',', '.'}");
}

public void testSpaceNotAllowedInIdPatternKeep() throws Exception {
public void testSpaceNotAllowedInIdPatternKeep() {
expectError("ROW a = 1, b = 1| KEEP a b", "extraneous input 'b'");
}

Expand Down Expand Up @@ -2939,4 +2939,14 @@ public void testNamedFunctionArgumentWithUnsupportedNamedParameterTypes() {
);
}
}

public void testRemoteLookupJoin() {
expectError("FROM remote:left | LOOKUP JOIN right ON field", "LOOKUP JOIN does not support remote cluster indices [remote:left]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should try and randomize the remote:left as much as possible. In that position, we absolutely allow all kinds of stuff, including date math and wildcard patterns.

expectError(
"FROM left | LOOKUP JOIN `remote:right` ON field",
"LOOKUP JOIN does not support remote cluster indices [remote:right]"
);
// TODO ES-10559 this should be replaced with a proper error message once grammar allows indexPattern as joinTarget
expectError("FROM my-index | LOOKUP JOIN remote:languages_lookup ON language_code", "line 1:35: token recognition error at: ':'");
Copy link
Contributor

@alex-spies alex-spies Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to say that we should already start adding cases with quoting here, but realized that's kinda silly due to the grammar just not being there yet.

But maybe it makes sense to consider the associated issue blocked by ES-10559, just because we'll need to add more tests once the grammar is generalized. (In a follow-up PR)

I can think of

  • LOOKUP JOIN remote:"languages_lookup"
  • LOOKUP JOIN "remote:languages_lookup"
  • LOOKUP JOIN """remote:languages_lookup"""
  • LOOKUP JOIN remote:"""languages_lookup"""

and I'm unsure if all of these actually make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am likely going to merge this after #120494 (that is changing the same classes). I will add above variations once resolving the conflicts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since #120494 adds nice options for randomization, we could also use randomization here.

Other test cases we could consider:

  • multiple colons
  • colons and wildcards
  • colons and date math - especially because we want to enable date math later, and valid date math expressions themselves can contain colons
  • quoting (" and """) of the remote name + 1 or multiple colons - just so we can later see how the error message changes from a token recognition error to a proper parsing exception once Ensure cluster string could be quoted #120355 lands.

}
}