Skip to content

Conversation

@idegtiarenko
Copy link
Contributor

It is not possible to perform lookup joins with CC at the moment.

Closes: ES-10552

It is not possible to perform lookup joins with CC at the moment.
@idegtiarenko idegtiarenko added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.18.0 labels Jan 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Comment on lines 658 to 660
if (join.left() instanceof EsRelation esr) {
for (var index : esr.index().concreteIndices()) {
if (index.indexOf(RemoteClusterAware.REMOTE_CLUSTER_INDEX_SEPARATOR) != -1) {
Copy link
Contributor

@bpintea bpintea Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd check both left() and right(), just to avoid the grammar being "fixed" and opening this possibility.
Also, there's StringUtils#splitQualifiedIndex() StringUtils#isQualified() that helps with this check, but arguably heavier and maybe not absolutely required.
Both notes as FYI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the right part of the join can be checked in LogicalPlanBuilder where the lookup is built.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current state, I think I'd prefer adding a check right here like Bogdan suggests. But, we could have both for maximum paranoia!

  • Check in LogicalPlanBuilder: currently redundant, as the grammar doesn't allow for : at all. I'll likely change my mind once we update the grammar. Because if we generalize the latter, a check in LogicalPlanBuilder would require only the parsed query; as such, it can prevent even sending a field caps request in case a user asks for ... | LOOKUP JOIN remote:lu_idx ....
  • Check here, in the Analyzer: using the concrete indices is a good source of definitive truth, as the concrete indices cannot contain aliases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not about paranoia, but about performance: fail fast without even checking the field_caps call(s) or resolve the clusters/indices and then fail in the Analyzer.

This needs a bit of research... there may be situations (at the Verifier level) where the behavior might change if we check things too early. For example, lookup join some_inexistent_remote_cluster:foo - do we complain that a remote lookup index is not supported or that some_inexistent_remote_cluster doesn't actually exist and the query is incorrect because maybe the user had a typo?

@astefan astefan requested a review from quux00 January 16, 2025 14:47
@astefan
Copy link
Contributor

astefan commented Jan 16, 2025

Also, I would wait for @quux00 take on this because skip_unvailable for CCS clusters might have an impact on how/if we react in such situations.
Meaning, there are use cases where an ES|QL query completely ignores a remote cluster if that cluster is unavailable (network issues, cluster stopped etc) and skip_unvailable setting is set to true and I think a wildcard is used in the index name and/or in the remote cluster name.

In the Analyzer, where you are checking this, this information might already tell you if you can reject the query or not, but I am not sure.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @idegtiarenko , I think the PR's approach has a flaw - but one that is easily fixed. I left a comment below.

Additionally, I realize a lot may change once we update the grammar. Maybe it makes sense to come back to this once we use the proper grammar for the lookup index, and keep the PR open until then?


public void testInvalidRemoteLookupJoin() {
// TODO ES-10559 this should be replaced with a proper error message once grammar allows indexPattern as joinTarget
expectError("FROM my-index | LOOKUP JOIN remote:languages_lookup ON language_code", "line 1:35: token recognition error at: ':'");
Copy link
Contributor

@alex-spies alex-spies Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to say that we should already start adding cases with quoting here, but realized that's kinda silly due to the grammar just not being there yet.

But maybe it makes sense to consider the associated issue blocked by ES-10559, just because we'll need to add more tests once the grammar is generalized. (In a follow-up PR)

I can think of

  • LOOKUP JOIN remote:"languages_lookup"
  • LOOKUP JOIN "remote:languages_lookup"
  • LOOKUP JOIN """remote:languages_lookup"""
  • LOOKUP JOIN remote:"""languages_lookup"""

and I'm unsure if all of these actually make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am likely going to merge this after #120494 (that is changing the same classes). I will add above variations once resolving the conflicts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since #120494 adds nice options for randomization, we could also use randomization here.

Other test cases we could consider:

  • multiple colons
  • colons and wildcards
  • colons and date math - especially because we want to enable date math later, and valid date math expressions themselves can contain colons
  • quoting (" and """) of the remote name + 1 or multiple colons - just so we can later see how the error message changes from a token recognition error to a proper parsing exception once Ensure cluster string could be quoted #120355 lands.

Comment on lines 658 to 660
if (join.left() instanceof EsRelation esr) {
for (var index : esr.index().concreteIndices()) {
if (index.indexOf(RemoteClusterAware.REMOTE_CLUSTER_INDEX_SEPARATOR) != -1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current state, I think I'd prefer adding a check right here like Bogdan suggests. But, we could have both for maximum paranoia!

  • Check in LogicalPlanBuilder: currently redundant, as the grammar doesn't allow for : at all. I'll likely change my mind once we update the grammar. Because if we generalize the latter, a check in LogicalPlanBuilder would require only the parsed query; as such, it can prevent even sending a field caps request in case a user asks for ... | LOOKUP JOIN remote:lu_idx ....
  • Check here, in the Analyzer: using the concrete indices is a good source of definitive truth, as the concrete indices cannot contain aliases.

}

// joining with remote cluster is not supported yet
if (join.left() instanceof EsRelation esr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too specific. I think it will not work in case of queries like

FROM remote:idx | EVAL x = 2*some_field | LOOKUP JOIN lu_idx ON key

Because the join's left child will not be an EsRelation. The EsRelation can be many levels down in the tree.

I'd approach this differently:

  • We only need to check the plans leaves, and only those with class EsRelation.
  • For all such relations, if any of them has IndexMode.LOOKUP, a lookup join happens.
  • Then, all EsRelations mustn't be remote. We can just iterate over all EsRelations and check their concrete indices.

Since my suggestion is a more global approach, I'd place this in the Verifier as a separate verification step, rather than here in the Analyzer. Reason: there can be multiple LOOKUP JOINs and we'll be re-checking the EsRelation in the FROM clause every time.

We have tools to easily traverse query plans, collect nodes that are instanceof a certain class, and I think also getting only the plan's leaves. Take a look at the transformDown methods and similar methods in the same class. You can also take a peek at ReplaceMissingFieldWithNull which needs to scan the plan for lookups as well.

VerificationException e = expectThrows(
VerificationException.class,
() -> analyze(
"FROM remote:my-index | LOOKUP JOIN languages_lookup ON language_code",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add test cases with multiple lookups, and test cases where more stuff happens between the FROM and the LOOKUP JOIN.

@quux00
Copy link
Contributor

quux00 commented Jan 17, 2025

In response to @astefan 's comment:

Also, I would wait for @quux00 take on this because skip_unavailable for CCS clusters might have an impact on how/if we react in such situations.
Meaning, there are use cases where an ES|QL query completely ignores a remote cluster if that cluster is unavailable (network issues, cluster stopped etc) and skip_unavailable setting is set to true and I think a wildcard is used in the index name and/or in the remote cluster name.
In the Analyzer, where you are checking this, this information might already tell you if you can reject the query or not, but I am not sure.

We are now limiting the scope of skip_unavailable at ES|QL plan time to only skip a cluster or fail a query if the cluster cannot be connected to at all (rather than the older model of looking at whether it has matching indices). If a cluster cannot be connected to, we detect that either at the enrich policy lookup time (if enrich is being used) or in the field-caps call. If the cluster has setting skip_unavailable=false, we fail the query at that point. If the cluster has skip_unavailable=true, then we mark it as skipped in the EsqlExecutionInfo and no information about its indices will be included in the EsIndex, IndexResolution and thus the PreAnalysisResult, so the query can proceed as if that cluster was not requested by the user.

In terms of limiting JOINs in the face of a cross-cluster search, I think you have two high level options:

  1. early on (even before doing a field-caps call) parse out whether there are any remote clusters included in the query and if a JOIN is present in the query - if so, just abort without doing any detailed analysis
  2. after field-caps IndexResolution, determine whether any remote clusters are being "JOINed on" and then throw an error

# Conflicts:
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java
@elasticsearchmachine
Copy link
Collaborator

Hi @idegtiarenko, I've created a changelog YAML for you.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @idegtiarenko ! The production code LGTM: since you confirmed that aliases cannot point to remote clusters, we know that CCQ happens if and only if any of the index patterns contains a colon outside of a date math expression. And that's what RemoteClusterAware.isRemoteIndexName does.

I do have some more suggestions for testing, though, that I think should be addressed before merging.

pr: 120277
summary: Disallow CCS with lookup join
area: ES|QL
type: enhancement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd declare this as non-issue and would remove the changelog. It's not really an enhancement, anyway :)

Comment on lines 529 to 531
@@ -264 +265 @@
if (MetadataAttribute.isSupported(id) == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: isn't this redundant, because below the p.forEachUp(UnresolvedRelation.class will walk over the join, too, and inspect the right hand side (which is an unresolved relation), anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below p only references the left side. Possibly it should be renamed to make it a bit more obvious.
We could create a join and execute .forEachUp on it instead.
We might use current approach if we believe distinct messages make sense for left and right side.


public void testInvalidRemoteLookupJoin() {
// TODO ES-10559 this should be replaced with a proper error message once grammar allows indexPattern as joinTarget
expectError("FROM my-index | LOOKUP JOIN remote:languages_lookup ON language_code", "line 1:35: token recognition error at: ':'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since #120494 adds nice options for randomization, we could also use randomization here.

Other test cases we could consider:

  • multiple colons
  • colons and wildcards
  • colons and date math - especially because we want to enable date math later, and valid date math expressions themselves can contain colons
  • quoting (" and """) of the remote name + 1 or multiple colons - just so we can later see how the error message changes from a token recognition error to a proper parsing exception once Ensure cluster string could be quoted #120355 lands.

}

public void testRemoteLookupJoin() {
expectError("FROM remote:left | LOOKUP JOIN right ON field", "LOOKUP JOIN does not support remote cluster indices [remote:left]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should try and randomize the remote:left as much as possible. In that position, we absolutely allow all kinds of stuff, including date math and wildcard patterns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more test case suggestion: comma-separated list of index patterns, some of which are remote. E.g. FROM local,remote:pattern*,... with and without quoting.

return p -> new LookupJoin(source, p, right, joinFields);
return p -> {
p.forEachUp(UnresolvedRelation.class, r -> {
if (RemoteClusterAware.isRemoteIndexName(r.table().index())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think there is a bug here: in case of multiple indices separated by commas, isRemoteIndexName is not appropriate because it treats the string as a single index name.

I think commas cannot be used inside date math expressions, and should thus be a correct separator to use to split r.table().index() back into parts. (Failing this, we'd have to go back to the step where the individual index patterns are parsed - but I don't think it should be necessary, given that we also use r.table().index() for the field caps calls.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, can we have : in date math expression? Otherwise presents of : in concatenated string should indicate remote was used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can have : in date math. And our findings also were that commas are not used in date math, so you need to split by commas and then call isRemoteIndexName.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a buggy edge case:

FROM idx,"<idx-{now/d{yyyy.MM.dd|+18:00}}>" | LOOKUP JOIN right ON field

This does not use CCQ and should thus not cause a parsing exception. It fails due to us not splitting the individual index patterns before checking for CCQ patterns.

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/LogicalPlanBuilder.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java
Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ievgen! I think the randomIndexPattern helper starts to really shine in this one.

I have only optional, minor remarks that I think you should just address at your own discretion. Thanks for the iterations!

Comment on lines +2960 to +2961
var basePattern = randomIndexPatterns(without(CROSS_CLUSTER));
var joinPattern = randomIndexPattern(without(WILDCARD_PATTERN), without(CROSS_CLUSTER));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: this looks nice.

// remote cluster on the right
var joinPattern = randomIndexPattern(CROSS_CLUSTER, without(WILDCARD_PATTERN));
expectError(
"FROM " + randomIndexPatterns() + " | LOOKUP JOIN " + joinPattern + " ON " + randomIdentifier(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: due to not excluding CCS in the FROM here and above, the test is actually slightly overconstrained, as an error message based on the usage of CCS in the FROM would also be correct.

That is to say, this test case passes because our implementation performs checks in a certain order, but ordered the other way around would also be correct.

}
{
// remote cluster on the left
var fromPatterns = randomIndexPatterns(CROSS_CLUSTER, without(WILDCARD_PATTERN));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this would be more complete if the fromPatterns wouldn't exclude wildcard patterns. They should not affect the validation.

@idegtiarenko idegtiarenko merged commit 242b841 into elastic:main Jan 27, 2025
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants