- 
                Notifications
    You must be signed in to change notification settings 
- Fork 25.6k
Remote Lookup Join implementation #129013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cceb86d    to
    bb3d64b      
    Compare
  
    29df38f    to
    59d98b2      
    Compare
  
    cb90a8b    to
    d8ec53d      
    Compare
  
    | Hi @smalyshev, I've created a changelog YAML for you. | 
| Pinging @elastic/es-analytical-engine (Team:Analytics) | 
| Pinging @elastic/es-search-foundations (Team:Search Foundations) | 
        
          
                x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All done now :) This is very nice!
I think this can nearly be merged, I would only make sure that we don't accidentally skip fwc tests as mentioned here. Other than that, my remarks are mostly on the minor side and also you already addressed a bunch of them :)
I mentioned some thoughts for follow-ups here. In addition, I saw many FIXMEs in the tests that I agree with. But that's for future work :)
        
          
                x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/LookupJoin.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                ...ulti-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                ...ulti-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
      | // If the query contains lookup indices, use only remotes to avoid duplication | ||
| onlyRemotes = true; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, doesn't this mean this suite doesn't at all test the case where a lookup happens both on remote and the local cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one doesn't because it wouldn't match the results - we'd have more rows than the local case. We have IT tests for that case. I haven't found any easy way to make these tests work in "both" case without substantially changing them.
        
          
                ...ulti-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java
          
            Show resolved
            Hide resolved
        
      | ZonedDateTime nowUtc = ZonedDateTime.now(ZoneOffset.UTC); | ||
| ZonedDateTime nextMidnight = nowUtc.plusDays(1).withHour(0).withMinute(0).withSecond(0).withNano(0); | ||
| // If we're too close to midnight, we could create index with one day and query with another, and it'd fail. | ||
| assumeTrue("Skip if too close to midnight", Duration.between(nowUtc, nextMidnight).toMinutes() >= 5); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
| assertThat(columns, hasItems("lookup_key", "lookup_name", "lookup_tag", "v", "tag")); | ||
|  | ||
| List<List<Object>> values = getValuesList(resp); | ||
| assertThat(values, hasSize(10)); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I really don't know if a missing remote lookup index should lead to skipping the whole remote. That's a behavior we need to double check in the future.
        
          
                ...c/internalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterLookupJoinIT.java
          
            Show resolved
            Hide resolved
        
              
          
                ...c/internalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterLookupJoinIT.java
          
            Show resolved
            Hide resolved
        
              
          
                ...c/internalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterLookupJoinIT.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, this looks mighty good already. Let's 
I'm not sure anymore if we already have a test that confirms that remote lookup joins remain disabled on non-SNAPSHOT builds. (This PR adds so many tests, I lost track. Nice test coverage, though!)
Could you please double check that we have such a test before merging? As a failsafe against accidentally enabling/releasing the feature prematurely.
| 
 We don't have a specific test for that, how would I test that? | 
| 
 Hm, we could add an IT with  | 
Remote lookup join implementation This patch enables using LOOKUP JOIN with cross-cluster queries. Example: FROM logs-*, remote:logs-* | LOOKUP JOIN clients on ip | SORT timestamp | LIMIT 100
Remote lookup join implementation This patch enables using LOOKUP JOIN with cross-cluster queries. Example: FROM logs-*, remote:logs-* | LOOKUP JOIN clients on ip | SORT timestamp | LIMIT 100
This patch enables using
LOOKUP JOINwith cross-cluster queries. Example:Currently some constructs are unsupported before
LOOKUP JOINwhen remote indices are involved:SORTLIMITSTATSENRICH _coordinator:...FORKThe LOOKUP JOIN with remote indices automatically assumes that the joins are executed remotely - i.e. LOOKUP JOIN with remote indices is forced to execute on the remote side of the query (inside
FragmentExec) - this is the source of the most exclusions above.