Skip to content

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented Jan 15, 2025

This is combined backport for two commits, #118889 and #119475. I just accidentally didn't bring back the first one until the second one was ready.

First

This adds some infrastructure that we can use to run LOOKUP JOIN using real LEFT JOIN semantics.

Right now if LOOKUP JOIN matches many rows in the lookup index we merge all of the values into a multivalued field. So the number of rows emitted from LOOKUP JOIN is the same as the number of rows that comes into LOOKUP JOIN.

This change builds the infrastructure to emit one row per match, mostly reusing the infrastructure from ENRICH.

Second

This makes LOOKUP return multiple rows if there are multiple matches. This is the way SQL works so it's probably what folks will expect. Even if it isn't, it allows for more optimizations. Like, this change doesn't optimize anything - it just changes the behavior. But there are optimizations you can do later that are transparent when we have this behavior, but not with the old behavior.

Example:

-  2  | [German, German, German] | [Austria, Germany, Switzerland]
+  2  | German                   | [Austria, Germany]
+  2  | German                   | Switzerland
+  2  | German                   | null

Relates: #118781

This adds some infrastructure that we can use to run LOOKUP JOIN using
real LEFT JOIN semantics.

Right now if LOOKUP JOIN matches many rows in the `lookup` index we
merge all of the values into a multivalued field. So the number of rows
emitted from LOOKUP JOIN is the same as the number of rows that comes
into LOOKUP JOIN.

This change builds the infrastructure to emit one row per match, mostly
reusing the infrastructure from ENRICH.
This makes `LOOKUP` return multiple rows if there are multiple matches. This is the way SQL works so it's *probably* what folks will expect. Even if it isn't, it allows for more optimizations. Like, this change doesn't optimize anything - it just changes the behavior. But there are optimizations you can do *later* that are transparent when we have *this* behavior, but not with the old behavior.

Example:
```
-  2  | [German, German, German] | [Austria, Germany, Switzerland]
+  2  | German                   | [Austria, Germany]
+  2  | German                   | Switzerland
+  2  | German                   | null
```

Relates: elastic#118781
Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a combined backport of both #118889 and #119475, right? Maybe let's add this to the PR's description so it's easier to understand the backporting that happened if we need to come back to this later.

ROW language_code = [4, 5, 6, 7]
| LOOKUP JOIN languages_lookup_non_unique_key ON language_code
| EVAL language_name = MV_SORT(language_name), country = MV_SORT(country)
| KEEP language_code, language_name, country
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that this backport is still pending.
Do you mind adding | SORT language_code, language_name, country here to avoid flakiness with multi-node?

See #120259

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@nik9000
Copy link
Member Author

nik9000 commented Jan 16, 2025

I've stolen @idegtiarenko's fix in #120259 and am checking it locally against this code. Should be fine. I'll push it in a bit and auto-merge this.

@nik9000 nik9000 enabled auto-merge (squash) January 16, 2025 14:05
@nik9000 nik9000 merged commit c340ba5 into elastic:8.x Jan 16, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants