-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Make LOOKUP more left-joiny (#118889) (#119475) #120232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Make LOOKUP more left-joiny (#118889) (#119475) #120232
Conversation
This adds some infrastructure that we can use to run LOOKUP JOIN using real LEFT JOIN semantics. Right now if LOOKUP JOIN matches many rows in the `lookup` index we merge all of the values into a multivalued field. So the number of rows emitted from LOOKUP JOIN is the same as the number of rows that comes into LOOKUP JOIN. This change builds the infrastructure to emit one row per match, mostly reusing the infrastructure from ENRICH.
This makes `LOOKUP` return multiple rows if there are multiple matches. This is the way SQL works so it's *probably* what folks will expect. Even if it isn't, it allows for more optimizations. Like, this change doesn't optimize anything - it just changes the behavior. But there are optimizations you can do *later* that are transparent when we have *this* behavior, but not with the old behavior. Example: ``` - 2 | [German, German, German] | [Austria, Germany, Switzerland] + 2 | German | [Austria, Germany] + 2 | German | Switzerland + 2 | German | null ``` Relates: elastic#118781
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ROW language_code = [4, 5, 6, 7] | ||
| | LOOKUP JOIN languages_lookup_non_unique_key ON language_code | ||
| | EVAL language_name = MV_SORT(language_name), country = MV_SORT(country) | ||
| | KEEP language_code, language_name, country |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that this backport is still pending.
Do you mind adding | SORT language_code, language_name, country here to avoid flakiness with multi-node?
See #120259
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
|
I've stolen @idegtiarenko's fix in #120259 and am checking it locally against this code. Should be fine. I'll push it in a bit and auto-merge this. |
This is combined backport for two commits, #118889 and #119475. I just accidentally didn't bring back the first one until the second one was ready.
First
This adds some infrastructure that we can use to run LOOKUP JOIN using real LEFT JOIN semantics.
Right now if LOOKUP JOIN matches many rows in the lookup index we merge all of the values into a multivalued field. So the number of rows emitted from LOOKUP JOIN is the same as the number of rows that comes into LOOKUP JOIN.
This change builds the infrastructure to emit one row per match, mostly reusing the infrastructure from ENRICH.
Second
This makes
LOOKUPreturn multiple rows if there are multiple matches. This is the way SQL works so it's probably what folks will expect. Even if it isn't, it allows for more optimizations. Like, this change doesn't optimize anything - it just changes the behavior. But there are optimizations you can do later that are transparent when we have this behavior, but not with the old behavior.Example:
Relates: #118781