- 
                Notifications
    You must be signed in to change notification settings 
- Fork 25.6k
ESQL: Allow remote enrich after LOOKUP JOIN #131940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Allow remote enrich after LOOKUP JOIN #131940
Conversation
Alternative approach: rather than reinventing the wheel, let's just mark any LOOKUP JOINs upstream from a remote ENRICH as remote, too, so the validation automatically kicks in.
| Hi @smalyshev, I've created a changelog YAML for you. | 
| Pinging @elastic/es-analytical-engine (Team:Analytics) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
I'd add a couple more test cases before merging, see my suggestions below. For good measure, I'd add a couple cases to PhysicalPlanOptimizerTests to demonstrate how the plans look like in case of lookup join before remote enrich.
Please proceed at your own discretion; I don't think there's any risk in merging this, even as-is.
        
          
                x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec
          
            Show resolved
            Hide resolved
        
      | | WHERE message == "Connected to 10.1.0.1" OR message == "Connected to 10.1.0.2" | ||
| | EVAL language_code = "1" | ||
| | LOOKUP JOIN message_types_lookup ON message | ||
| | SORT message ASC | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also add tests with limits in between the two. Also, multiple TopNs/Limits, maybe even silly stuff like TopN -> Limit -> TopN -> Limit (which should become a single TopN if the numbers keep going down, but not necessarily otherwise)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is lookup join suite though, not sure many TopN really the right place here, given as I can't run TopN before either remote join or remote enrich, and just running several TopN's after doesn't really have much to do with either join or enrich?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't run TopN before either remote join or remote enrich
TopN can be run before remote enrich, right? That's what the remote enrich hack enables. In fact, this very query should optimize to a LJ -> Top2 -> remote ENRICH because the limit will be pushed down.
Having multiple sorts and limits between the LJ and remote ENRICH should work, even if the query'd be a bit silly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I'll see if I can add something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test case suggestion:
- Multiple lookup joins and multiple enriches, also intertwined LJ -> ENRICH -> LJ -> ENRICH
- Cases with several unary plans in between the LJ and the ENRICH
| """, analyzer); | ||
| assertThat( | ||
| err, | ||
| containsString("4:3: LOOKUP JOIN with remote indices can't be executed after [STATS c = COUNT(*) by languages]@2:3") | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welp, the error message may have to be improved in the future. But I know that'd require more effort as the validation for remote LJ requires it to be marked as remote already before optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is not the most obvious result here. Not sure how to make it better - technically the message is correct, but probably doesn't explain well what's going on.
| // LOOKUP JOIN | ||
| } | ||
|  | ||
| public void testRemoteEnrichAfterLookupJoinWithPipelineBreaker() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests don't rely on post-optimization verification, right? Maybe it'd be a bit more fitting to have them be part of the VerifierTests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these are from postAnalysisVerification so maybe I should move them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No wait I'm wrong, only LIMIT is tested pre-optimizer, the rest is tested post-optimizer. So I could move the LIMIT one to Analyzer part but tbh I'd prefer to keep them all together.
LOOKUP JOIN now allowed before remote ENRICH again.
Relates #129372, #115897
Supersedes #131286