ESQL: Make LOOKUP more left-joiny #119475

nik9000 · 2025-01-02T20:01:14Z

This makes LOOKUP return multiple rows if there are multiple matches. This is the way SQL works so it's probably what folks will expect. Even if it isn't, it allows for more optimizations. Like, this change doesn't optimize anything - it just changes the behavior. But there are optimizations you can do later that are transparent when we have this behavior, but not with the old behavior.

Example:

-  2  | [German, German, German] | [Austria, Germany, Switzerland]
+  2  | German                   | [Austria, Germany]
+  2  | German                   | Switzerland
+  2  | German                   | null

Relates: #118781

Had to disable some stuff

nik9000 · 2025-01-02T20:01:52Z

Marked non-issue because this is part of a larger change that will get it's own big ticket release note.

elasticsearchmachine · 2025-01-02T20:01:57Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

…into left_join_for_lookup_2_take_2

nik9000 · 2025-01-03T13:42:58Z

At this point I've just flipped any failing tests but not wrote any new ones. I think today I'll take an inventory of what we have and see if I need more. I wouldn't be surprised if we already have enough cases to be honest.

There's also an open question about what to do about multivalued input fields. Right now we generate multiple matching rows which somewhat mimics how ENRICH worked and it's the behavior you get when you "just" plug in the LEFT JOIN-ness that I built without doing any more work. I think we don't want this behavior, at least, it's not what folks expect. Probably? I think it'd be safest to generate "match_none" queries for these cases in the short term. But that's fine to do in a follow-up change to this one.

Also also! I disabled an optimization path that ENRICH uses when the inputs are ordinated blocks because I couldn't figure out exactly what was happening there and it wasn't working with my LEFT JOINinator. I'm OK leaving that disabled for now but I'd like to double back on it before too long. It's a very good performance boost.

nik9000 · 2025-01-03T13:43:30Z

Also also also we're getting to the point where we don't share that much with ENRICH. Slowly slowly slowly the sharing is looking silly. I don't think we're there yet, but we're on our way.

nik9000 · 2025-01-03T18:37:10Z

The 8.x failures are odd. Checking them. It looks like we're running a LOOKUP test against the 8.x cluster but it does not have the capability so I'm not sure what's up.

costin

Made a first pass - this looks great Nik!
Want to take another look however no need to wait for me if the others approve it.

costin · 2025-01-08T06:30:27Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec

-| SORT emp_no
-| EVAL country = MV_SORT(country)
+| SORT emp_no, country
 | KEEP emp_no, language_code, language_name, country
 ;

-emp_no:integer | language_code:integer | language_name:keyword       | country:keyword
-10091          | 1                     | [English, English, English] | [Canada, United Kingdom, United States of America]
-10092          | 2                     | [German, German, German]    | [Austria, Germany, Switzerland]
-10093          | 3                     | null                        | null
-10094          | 4                     | Quenya                      | null
-10095          | 5                     | null                        | Atlantis
+emp_no:integer | language_code:integer | language_name:keyword | country:text
+    10091      | 1                     | English               | Canada
+    10091      | 1                     | null                  | United Kingdom
+    10091      | 1                     | English               | United States of America
+    10091      | 1                     | English               | null
+    10092      | 2                     | German                | [Germany, Austria]
+    10092      | 2                     | German                | Switzerland
+    10092      | 2                     | German                | null
+    10093      | 3                     | null                  | null
+    10094      | 4                     | Quenya                | null
+    10095      | 5                     | null                  | Atlantis


alex-spies

I made a first pass. Looking mighty good so far!

I have a small suggestion for one or two additional tests.

I'm noticing one little blind spot in our csv tests: they generally run on few rows of data. This means that edge cases related to RightChunkedLeftJoin having to process multiple pages from the lookupindex (in a single request) is unlikely to be tested end-to-end. But I think this is tested in RightChunkedLeftJoinTests, so we're likely fine.

alex-spies · 2025-01-09T09:08:48Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec

 10006          | 6                     | Mv-Lang
-10007          | 7                     | [Mv-Lang, Mv-Lang2]
+10007          | 7                     | Mv-Lang
+10007          | 7                     | Mv-Lang2


Glorified bookmark for myself: Here and below we can see matches on multi-valued keys (language code 6-8). We plan to null-ify these in a follow-up PR, as the exact behavior is yet to be decided.

...ql/compute/src/main/java/org/elasticsearch/compute/operator/lookup/RightChunkedLeftJoin.java

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/AsyncOperator.java

alex-spies · 2025-01-13T18:02:36Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec

 | LOOKUP JOIN languages_lookup_non_unique_key ON language_code
 | EVAL country = MV_SORT(country)
 | KEEP emp_no, language_code, language_name, country
 ;


super nit: technically, the MV_SORT above is not required anymore. Arguably, it's better to remove it, as the LOOKUP JOIN should not re-order the multivalues that it looks up.

Applies similarly below.

The [Austria, Germany] line will get loaded in ascending order from ES but, I believe, whatever order they are in the csv in the csv tests. I think I could make it work without, but it's kind of tricky actually.

There's another test in this file that doesn't use MV_SORT and has the expected result in the original order ([Germany, Austria]). This made me think that MV_SORT is not required, after all.

If ES will re-order this, then maybe we should just update the csv file with the source documents?

It's not a huge deal by any means. But I think it's reasonable to assume that LOOKUP JOIN does nothing to the looked up fields; and MV_SORTing everywhere in the tests prevents the tests from asserting this assumption.

I'll have a look at these with a single node real cluster and see what I see. The ones without the MV_SORT should differ depending on how we run that and break. I'll get back to you here.

Ok. The reason ES doesn't sort these as that the field is text and not keyword. text fields are loaded from the _source and we keep their ordering.

Thanks for double checking, I learned something! :)

alex-spies · 2025-01-13T18:07:31Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec

In most of the nonUniqueRightKey... tests, the join key on the left is neatly sorted.

Maybe let's add an example where this is specifically not the case, where the join key is out of order and repeats?

We could construct a sequence of desired join keys via e.g.

ROW language_code = [1, 1, 7, 2, 3, 3] | MV_EXPAND language_code | LOOKUP JOIN languages_lookup_non_unique_key ON language_code

(Assuming that MV_EXPAND preserves the multivalue's order, which we technically don't document.)

Another example would be out of order and containing nulls and keys without corresponding right hand key; that won't work via MV_EXPAND, though (no nulls in MVs) but we could probably piece something together using the employees index.

alex-spies

Ok, still looking very good!

I still need to take a (final) look at some test classes, but so nothing major jumps out to me.

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/AbstractLookupService.java

alex-spies · 2025-01-14T14:02:00Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec

Another interesting test case would be when the left hand side join keys are all nulls. Since we optimize for null pages on the left hand side, a short end-to-end test would confirm that all is well in this code path.

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/AbstractLookupService.java

alex-spies · 2025-01-14T17:25:12Z

...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexService.java

    }
+
+    protected static class LookupResponse extends AbstractLookupService.LookupResponse {
+        private List<Page> pages;


So, the response will be a singular list of pages. That's reasonable for now, although we could also stream back the pages - which could be interesting in cases where there are ridiculous amounts of matches on the right hand side for keys from the left.

I think that's edge-case-y enough not to worry about for now. But I think we'll want a heap attack test for this in a follow up - because attempting to send big responses over the wire can OOM rather than tripping a breaker.

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/AsyncOperator.java

alex-spies · 2025-01-14T17:47:53Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexOperator.java

+            Page right = ongoing.itr.next();
+            emittedPages++;
+            try {
+                return ongoing.join.join(right);


In case this trips a breaker, do we need to manually release everything this operator is still holding onto? Including all of ongoing?

If we trip a breaker inside this call then:

join will release the Blocks it's working

we'll release right just below this.

the exception will bubble up into the ThreadPool which will call the onFailure handler around Driver:400 which will call drainAndCloseOperators which will close every Operator, one by one. That'll close the ongoing merge, releasing any pages still in the Iterator.

That's some more stuff to shutdown the DriverContext in an orderly way that I don't have all loaded in my head, but, yeah, we release it all.

One thing I can do in this PR is to make an extension of OperatorTestCase for this. That has a ton of nice tests for all this, including one where it installs a breaker that randomly fails. That helps make sure we release things properly no matter what craziness happens.

That'd be lovely!

Actually this requires quite a bit of juggling. Checking....

Yeah. I'm going to hold this until a follow up change can get in that'll let us share more.

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexOperator.java

nik9000 · 2025-01-15T14:07:23Z

I've replied to all comments as of this morning and believe there are three things left:

Any more comments @alex-spies has
Implementing OperatorTestCase for these to check for circuit breaker health better
Reviewing the CSV test case comments more closely.

...ql/src/test/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexOperatorStatusTests.java

...l/src/test/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexServiceResponseTests.java

alex-spies · 2025-01-15T14:17:41Z

...l/src/test/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexServiceResponseTests.java

+
+    @Override
+    protected LookupFromIndexService.LookupResponse createTestInstance() {
+        // Can't use a real block factory for the basic serialization tests because they don't release.


praise: comment is very helpful

alex-spies · 2025-01-15T14:20:03Z

...l/src/test/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexServiceResponseTests.java

+        return new NamedWriteableRegistry(List.of(IntBlock.ENTRY));
+    }
+
+    public void testWithBreaker() throws IOException {


This is a nice test.

...l/src/test/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexServiceResponseTests.java

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/AsyncOperator.java

alex-spies

All finished now. Mostly minor remarks, so I think this already deserves a thumbs-up, as it could be merged as-is (maybe except for the copy-paste mistake in one of the tests, that tests the wrong status). Thanks a lot @nik9000 , this is great!

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/AbstractLookupService.java

alex-spies · 2025-01-15T14:49:41Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexOperator.java

+            Page right = ongoing.itr.next();
+            emittedPages++;
+            try {
+                return ongoing.join.join(right);


That'd be lovely!

alex-spies · 2025-01-15T14:58:53Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/EnrichLookupOperator.java

        );
+        CheckedFunction<List<Page>, Page, Exception> handleResponse = pages -> {
+            if (pages.size() != 1) {
+                throw new UnsupportedOperationException("ENRICH should only return a single page");


Urgh, you're right. I wrongly assumed that we had to explicitly inherit from ElasticSearchException to get a status code other than 500, but apparently IllegalArgumentException is getting caught specifically and results in a 400.

I think we introduced (Es)qlIllegalArgumentException for this, which returns a 500. But we also don't use that consistently, I think. There's also IllegalStateException which returns 500.

astefan

I've tested the PR a bit. I have an edgy-casey scenario where both the main index and the lookup index have a text field with a keyword subfield (the default mapping when a document is being added to an index without a mapping for that field).

Because it was easier for me, I added 3 docs to employees and two docs to a lookup index like this:

{"index":{"_index":"test1"}}
{"counter": 11, "text2":"red fox jumps over black dog"}
{"index":{"_index":"test1"}}
{"counter": 12, "text2":"black dog"}
{"index":{"_index":"employees"}}
{"text2":"red fox jumps over black dog", "emp_no":555}
{"index":{"_index":"employees"}}
{"text2":"red fox", "emp_no":666}
{"index":{"_index":"employees"}}
{"text2":"black dog", "emp_no":777}

The query FROM employees | where emp_no < 1000 | LOOKUP JOIN test1 ON text2.keyword | KEEP emp_no, counter, text2 returns something like this:

    emp_no     |    counter    |           text2            
---------------+---------------+----------------------------
555            |11             |red fox jumps over black dog
666            |null           |null                        
777            |12             |black dog

while the query with a lookup on the text field itself it doesn't seem to match the values (counter values all come null and I assume that the values are not picked up from the lookup index):

    emp_no     |    counter    |           text2            
---------------+---------------+----------------------------
555            |null           |red fox jumps over black dog
666            |null           |red fox                     
777            |null           |black dog

astefan · 2025-01-15T10:01:01Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec

+10002          | 2                     | German
+10002          | 2                     | German
+10002          | 2                     | German
 10003          | null                  | null


I've looked at the languages_non_unique_key.csv file and I was intrigued by the missing key (language_code) in the last two records there. My first reaction was "why is language_name null and not all the values corresponding to the two records in languages_non_unique_key.csv?". I guess we have an explanation for that and there is no reason for concern :-).

astefan · 2025-01-15T15:59:34Z

What I missed in mentioning is the following slight inconsistency. Considering the above additional field added to employees, the query FROM employees | where emp_no < 1000 | where text2 == "black dog" matches. This feels like something that can be definitely added later (outside this PR) IF it's something that makes sense for the condition of a lookup join.

nik9000 · 2025-01-15T17:49:22Z

Adding the new test case needs #120223

nik9000 · 2025-01-15T18:00:47Z

OK. Rereading @astefan's comment it seems to be quite a bit about text fields and their "funny" matching characteristics for lookup. For now we're forbidding them. @craigtaverner is working on that.

elasticsearchmachine · 2025-01-15T18:02:03Z

💔 Backport failed

Status	Branch	Result
❌	8.x	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 119475

nik9000 · 2025-01-15T19:46:18Z

Backport coming by hand

This makes `LOOKUP` return multiple rows if there are multiple matches. This is the way SQL works so it's *probably* what folks will expect. Even if it isn't, it allows for more optimizations. Like, this change doesn't optimize anything - it just changes the behavior. But there are optimizations you can do *later* that are transparent when we have *this* behavior, but not with the old behavior. Example: ``` - 2 | [German, German, German] | [Austria, Germany, Switzerland] + 2 | German | [Austria, Germany] + 2 | German | Switzerland + 2 | German | null ``` Relates: elastic#118781

nik9000 · 2025-01-15T19:50:50Z

backport is #120232

* ESQL: Compute infrastruture for LEFT JOIN (#118889) This adds some infrastructure that we can use to run LOOKUP JOIN using real LEFT JOIN semantics. Right now if LOOKUP JOIN matches many rows in the `lookup` index we merge all of the values into a multivalued field. So the number of rows emitted from LOOKUP JOIN is the same as the number of rows that comes into LOOKUP JOIN. This change builds the infrastructure to emit one row per match, mostly reusing the infrastructure from ENRICH. * ESQL: Make LOOKUP more left-joiny (#119475) This makes `LOOKUP` return multiple rows if there are multiple matches. This is the way SQL works so it's *probably* what folks will expect. Even if it isn't, it allows for more optimizations. Like, this change doesn't optimize anything - it just changes the behavior. But there are optimizations you can do *later* that are transparent when we have *this* behavior, but not with the old behavior. Example: ``` - 2 | [German, German, German] | [Austria, Germany, Switzerland] + 2 | German | [Austria, Germany] + 2 | German | Switzerland + 2 | German | null ``` Relates: #118781

craigtaverner · 2025-01-17T10:45:29Z

OK. Rereading @astefan's comment it seems to be quite a bit about text fields and their "funny" matching characteristics for lookup. For now we're forbidding them. @craigtaverner is working on that.

The PR for this is at #119473

nik9000 · 2025-01-17T12:50:00Z

OK. Rereading @astefan's comment it seems to be quite a bit about text fields and their "funny" matching characteristics for lookup. For now we're forbidding them. @craigtaverner is working on that.

The PR for this is at #119473

Does it "solve" Andrei's concerns? I just don't want to lose his comment.

nik9000 added 7 commits December 30, 2024 11:28

ESQL: Rewire multivalue handling of LOOKUP

ce866fe

Move thing

7b6e205

Rework async operator

146403a

Works now?

e81febf

Had to disable some stuff

Spotless

aada8b2

Merge branch 'main' into left_join_for_lookup_2_take_2

85c3609

Fix release

d8a4760

nik9000 added :Analytics/ES|QL AKA ESQL v9.0.0 v8.18.0 >non-issue labels Jan 2, 2025

nik9000 requested a review from ivancea January 2, 2025 20:01

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 2, 2025

nik9000 requested review from alex-spies, costin and craigtaverner January 2, 2025 20:01

elasticsearchmachine and others added 4 commits January 2, 2025 20:14

[CI] Auto commit changes from spotless

f655c5e

Spotless

02b5077

Merge remote-tracking branch 'nik9000/left_join_for_lookup_2_take_2' …

4de82c1

…into left_join_for_lookup_2_take_2

NOCOMMITS

4900c75

nik9000 mentioned this pull request Jan 3, 2025

ESQL: LOOKUP JOIN should have SQL LEFT JOIN semantics #118781

Closed

nik9000 added 3 commits January 3, 2025 08:55

Merge branch 'main' into left_join_for_lookup_2_take_2

f524291

Merge branch 'main' into left_join_for_lookup_2_take_2

25dd47a

Merge branch 'main' into left_join_for_lookup_2_take_2

1dad8f5

I need a new version!

54f049c

costin reviewed Jan 8, 2025

View reviewed changes

alex-spies reviewed Jan 13, 2025

View reviewed changes

alex-spies reviewed Jan 14, 2025

View reviewed changes

alex-spies self-requested a review January 14, 2025 18:10

nik9000 added 2 commits January 14, 2025 14:03

Merge branch 'main' into left_join_for_lookup_2_take_2

703e1f7

From review

ded2764

alex-spies reviewed Jan 15, 2025

View reviewed changes

...ql/src/test/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexOperatorStatusTests.java Outdated Show resolved Hide resolved

...ql/src/test/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexOperatorStatusTests.java Outdated Show resolved Hide resolved

Merge branch 'main' into left_join_for_lookup_2_take_2

7b9f50e

alex-spies reviewed Jan 15, 2025

View reviewed changes

alex-spies approved these changes Jan 15, 2025

View reviewed changes

alex-spies reviewed Jan 15, 2025

View reviewed changes

alex-spies mentioned this pull request Jan 15, 2025

LOOKUP JOIN fails with cryptic error when we DROP the fields looked up #118778

Closed

Drop MV_SORTs in tests

176b8eb

astefan reviewed Jan 15, 2025

View reviewed changes

nik9000 added the auto-backport Automatically create backport pull requests when merged label Jan 15, 2025

Merge branch 'main' into left_join_for_lookup_2_take_2

98d9cc2

nik9000 merged commit 4462070 into elastic:main Jan 15, 2025
16 checks passed

elasticsearchmachine added the backport pending label Jan 15, 2025

alex-spies mentioned this pull request Jan 16, 2025

ESQL: Make LOOKUP more left-joiny (#118889) (#119475) #120232

Merged

ESQL: Make LOOKUP more left-joiny #119475

ESQL: Make LOOKUP more left-joiny #119475

Uh oh!

Conversation

nik9000 commented Jan 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nik9000 commented Jan 2, 2025

Uh oh!

elasticsearchmachine commented Jan 2, 2025

Uh oh!

nik9000 commented Jan 3, 2025

Uh oh!

nik9000 commented Jan 3, 2025

Uh oh!

nik9000 commented Jan 3, 2025

Uh oh!

costin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex-spies left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex-spies left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nik9000 commented Jan 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alex-spies left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

nik9000 commented Jan 2, 2025 •

edited

Loading

alex-spies left a comment •

edited

Loading

astefan left a comment •

edited

Loading