ESQL: Make LOOKUP more left-joiny (#118889) (#119475) #120232

nik9000 · 2025-01-15T19:50:19Z

This is combined backport for two commits, #118889 and #119475. I just accidentally didn't bring back the first one until the second one was ready.

First

This adds some infrastructure that we can use to run LOOKUP JOIN using real LEFT JOIN semantics.

Right now if LOOKUP JOIN matches many rows in the lookup index we merge all of the values into a multivalued field. So the number of rows emitted from LOOKUP JOIN is the same as the number of rows that comes into LOOKUP JOIN.

This change builds the infrastructure to emit one row per match, mostly reusing the infrastructure from ENRICH.

Second

This makes LOOKUP return multiple rows if there are multiple matches. This is the way SQL works so it's probably what folks will expect. Even if it isn't, it allows for more optimizations. Like, this change doesn't optimize anything - it just changes the behavior. But there are optimizations you can do later that are transparent when we have this behavior, but not with the old behavior.

Example:

-  2  | [German, German, German] | [Austria, Germany, Switzerland]
+  2  | German                   | [Austria, Germany]
+  2  | German                   | Switzerland
+  2  | German                   | null

Relates: #118781

This adds some infrastructure that we can use to run LOOKUP JOIN using real LEFT JOIN semantics. Right now if LOOKUP JOIN matches many rows in the `lookup` index we merge all of the values into a multivalued field. So the number of rows emitted from LOOKUP JOIN is the same as the number of rows that comes into LOOKUP JOIN. This change builds the infrastructure to emit one row per match, mostly reusing the infrastructure from ENRICH.

This makes `LOOKUP` return multiple rows if there are multiple matches. This is the way SQL works so it's *probably* what folks will expect. Even if it isn't, it allows for more optimizations. Like, this change doesn't optimize anything - it just changes the behavior. But there are optimizations you can do *later* that are transparent when we have *this* behavior, but not with the old behavior. Example: ``` - 2 | [German, German, German] | [Austria, Germany, Switzerland] + 2 | German | [Austria, Germany] + 2 | German | Switzerland + 2 | German | null ``` Relates: elastic#118781

alex-spies

This is a combined backport of both #118889 and #119475, right? Maybe let's add this to the PR's description so it's easier to understand the backporting that happened if we need to come back to this later.

idegtiarenko · 2025-01-16T10:32:24Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec

 ROW language_code = [4, 5, 6, 7]
 | LOOKUP JOIN languages_lookup_non_unique_key ON language_code
-| EVAL language_name = MV_SORT(language_name), country = MV_SORT(country)
 | KEEP language_code, language_name, country


I just realized that this backport is still pending.
Do you mind adding | SORT language_code, language_name, country here to avoid flakiness with multi-node?

See #120259

nik9000 · 2025-01-16T13:38:05Z

I've stolen @idegtiarenko's fix in #120259 and am checking it locally against this code. Should be fine. I'll push it in a bit and auto-merge this.

nik9000 added 2 commits January 15, 2025 14:23

nik9000 added backport v8.18.0 labels Jan 15, 2025

This was referenced Jan 15, 2025

ESQL: Make LOOKUP more left-joiny #119475

Merged

ESQL: Compute infrastruture for LEFT JOIN #118889

Merged

alex-spies approved these changes Jan 16, 2025

View reviewed changes

idegtiarenko reviewed Jan 16, 2025

View reviewed changes

idegtiarenko approved these changes Jan 16, 2025

View reviewed changes

nik9000 added 2 commits January 16, 2025 08:42

Fix test

f1a6732

Merge branch '8.x' into left_join_for_lookup_2_take_2_8x

3b8f546

nik9000 enabled auto-merge (squash) January 16, 2025 14:05

Merge branch '8.x' into left_join_for_lookup_2_take_2_8x

b843157

nik9000 merged commit c340ba5 into elastic:8.x Jan 16, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ESQL: Make LOOKUP more left-joiny (#118889) (#119475) #120232

ESQL: Make LOOKUP more left-joiny (#118889) (#119475) #120232

Uh oh!

nik9000 commented Jan 15, 2025 •

edited

Loading

Uh oh!

alex-spies left a comment

Uh oh!

idegtiarenko Jan 16, 2025

Uh oh!

nik9000 Jan 16, 2025

Uh oh!

idegtiarenko Jan 16, 2025

Uh oh!

nik9000 commented Jan 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ESQL: Make LOOKUP more left-joiny (#118889) (#119475) #120232

ESQL: Make LOOKUP more left-joiny (#118889) (#119475) #120232

Uh oh!

Conversation

nik9000 commented Jan 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

First

Second

Uh oh!

alex-spies left a comment

Choose a reason for hiding this comment

Uh oh!

idegtiarenko Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

nik9000 Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

idegtiarenko Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Jan 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nik9000 commented Jan 15, 2025 •

edited

Loading