-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Remote Lookup Join implementation #129013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cceb86d to
bb3d64b
Compare
29df38f to
59d98b2
Compare
cb90a8b to
d8ec53d
Compare
|
Hi @smalyshev, I've created a changelog YAML for you. |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All done now :) This is very nice!
I think this can nearly be merged, I would only make sure that we don't accidentally skip fwc tests as mentioned here. Other than that, my remarks are mostly on the minor side and also you already addressed a bunch of them :)
I mentioned some thoughts for follow-ups here. In addition, I saw many FIXMEs in the tests that I agree with. But that's for future work :)
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/LookupJoin.java
Outdated
Show resolved
Hide resolved
...ulti-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java
Outdated
Show resolved
Hide resolved
...ulti-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java
Outdated
Show resolved
Hide resolved
| // If the query contains lookup indices, use only remotes to avoid duplication | ||
| onlyRemotes = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, doesn't this mean this suite doesn't at all test the case where a lookup happens both on remote and the local cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one doesn't because it wouldn't match the results - we'd have more rows than the local case. We have IT tests for that case. I haven't found any easy way to make these tests work in "both" case without substantially changing them.
...ulti-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java
Show resolved
Hide resolved
| ZonedDateTime nowUtc = ZonedDateTime.now(ZoneOffset.UTC); | ||
| ZonedDateTime nextMidnight = nowUtc.plusDays(1).withHour(0).withMinute(0).withSecond(0).withNano(0); | ||
| // If we're too close to midnight, we could create index with one day and query with another, and it'd fail. | ||
| assumeTrue("Skip if too close to midnight", Duration.between(nowUtc, nextMidnight).toMinutes() >= 5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
| assertThat(columns, hasItems("lookup_key", "lookup_name", "lookup_tag", "v", "tag")); | ||
|
|
||
| List<List<Object>> values = getValuesList(resp); | ||
| assertThat(values, hasSize(10)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I really don't know if a missing remote lookup index should lead to skipping the whole remote. That's a behavior we need to double check in the future.
...c/internalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterLookupJoinIT.java
Show resolved
Hide resolved
...c/internalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterLookupJoinIT.java
Show resolved
Hide resolved
...c/internalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterLookupJoinIT.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, this looks mighty good already. Let's ![]()
I'm not sure anymore if we already have a test that confirms that remote lookup joins remain disabled on non-SNAPSHOT builds. (This PR adds so many tests, I lost track. Nice test coverage, though!)
Could you please double check that we have such a test before merging? As a failsafe against accidentally enabling/releasing the feature prematurely.
We don't have a specific test for that, how would I test that? |
Hm, we could add an IT with |
Remote lookup join implementation This patch enables using LOOKUP JOIN with cross-cluster queries. Example: FROM logs-*, remote:logs-* | LOOKUP JOIN clients on ip | SORT timestamp | LIMIT 100
Remote lookup join implementation This patch enables using LOOKUP JOIN with cross-cluster queries. Example: FROM logs-*, remote:logs-* | LOOKUP JOIN clients on ip | SORT timestamp | LIMIT 100
This patch enables using
LOOKUP JOINwith cross-cluster queries. Example:Currently some constructs are unsupported before
LOOKUP JOINwhen remote indices are involved:SORTLIMITSTATSENRICH _coordinator:...FORKThe LOOKUP JOIN with remote indices automatically assumes that the joins are executed remotely - i.e. LOOKUP JOIN with remote indices is forced to execute on the remote side of the query (inside
FragmentExec) - this is the source of the most exclusions above.