-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Make LOOKUP more left-joiny #119475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Marked non-issue because this is part of a larger change that will get it's own big ticket release note. |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
…into left_join_for_lookup_2_take_2
|
At this point I've just flipped any failing tests but not wrote any new ones. I think today I'll take an inventory of what we have and see if I need more. I wouldn't be surprised if we already have enough cases to be honest. There's also an open question about what to do about multivalued input fields. Right now we generate multiple matching rows which somewhat mimics how ENRICH worked and it's the behavior you get when you "just" plug in the LEFT JOIN-ness that I built without doing any more work. I think we don't want this behavior, at least, it's not what folks expect. Probably? I think it'd be safest to generate "match_none" queries for these cases in the short term. But that's fine to do in a follow-up change to this one. Also also! I disabled an optimization path that ENRICH uses when the inputs are ordinated blocks because I couldn't figure out exactly what was happening there and it wasn't working with my LEFT JOINinator. I'm OK leaving that disabled for now but I'd like to double back on it before too long. It's a very good performance boost. |
|
Also also also we're getting to the point where we don't share that much with ENRICH. Slowly slowly slowly the sharing is looking silly. I don't think we're there yet, but we're on our way. |
|
The 8.x failures are odd. Checking them. It looks like we're running a LOOKUP test against the 8.x cluster but it does not have the capability so I'm not sure what's up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a first pass - this looks great Nik!
Want to take another look however no need to wait for me if the others approve it.
| | SORT emp_no | ||
| | EVAL country = MV_SORT(country) | ||
| | SORT emp_no, country | ||
| | KEEP emp_no, language_code, language_name, country | ||
| ; | ||
|
|
||
| emp_no:integer | language_code:integer | language_name:keyword | country:keyword | ||
| 10091 | 1 | [English, English, English] | [Canada, United Kingdom, United States of America] | ||
| 10092 | 2 | [German, German, German] | [Austria, Germany, Switzerland] | ||
| 10093 | 3 | null | null | ||
| 10094 | 4 | Quenya | null | ||
| 10095 | 5 | null | Atlantis | ||
| emp_no:integer | language_code:integer | language_name:keyword | country:text | ||
| 10091 | 1 | English | Canada | ||
| 10091 | 1 | null | United Kingdom | ||
| 10091 | 1 | English | United States of America | ||
| 10091 | 1 | English | null | ||
| 10092 | 2 | German | [Germany, Austria] | ||
| 10092 | 2 | German | Switzerland | ||
| 10092 | 2 | German | null | ||
| 10093 | 3 | null | null | ||
| 10094 | 4 | Quenya | null | ||
| 10095 | 5 | null | Atlantis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a first pass. Looking mighty good so far!
I have a small suggestion for one or two additional tests.
I'm noticing one little blind spot in our csv tests: they generally run on few rows of data. This means that edge cases related to RightChunkedLeftJoin having to process multiple pages from the lookupindex (in a single request) is unlikely to be tested end-to-end. But I think this is tested in RightChunkedLeftJoinTests, so we're likely fine.
| 10006 | 6 | Mv-Lang | ||
| 10007 | 7 | [Mv-Lang, Mv-Lang2] | ||
| 10007 | 7 | Mv-Lang | ||
| 10007 | 7 | Mv-Lang2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glorified bookmark for myself: Here and below we can see matches on multi-valued keys (language code 6-8). We plan to null-ify these in a follow-up PR, as the exact behavior is yet to be decided.
...ql/compute/src/main/java/org/elasticsearch/compute/operator/lookup/RightChunkedLeftJoin.java
Show resolved
Hide resolved
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/AsyncOperator.java
Outdated
Show resolved
Hide resolved
| | LOOKUP JOIN languages_lookup_non_unique_key ON language_code | ||
| | EVAL country = MV_SORT(country) | ||
| | KEEP emp_no, language_code, language_name, country | ||
| ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: technically, the MV_SORT above is not required anymore. Arguably, it's better to remove it, as the LOOKUP JOIN should not re-order the multivalues that it looks up.
Applies similarly below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The [Austria, Germany] line will get loaded in ascending order from ES but, I believe, whatever order they are in the csv in the csv tests. I think I could make it work without, but it's kind of tricky actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's another test in this file that doesn't use MV_SORT and has the expected result in the original order ([Germany, Austria]). This made me think that MV_SORT is not required, after all.
If ES will re-order this, then maybe we should just update the csv file with the source documents?
It's not a huge deal by any means. But I think it's reasonable to assume that LOOKUP JOIN does nothing to the looked up fields; and MV_SORTing everywhere in the tests prevents the tests from asserting this assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have a look at these with a single node real cluster and see what I see. The ones without the MV_SORT should differ depending on how we run that and break. I'll get back to you here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. The reason ES doesn't sort these as that the field is text and not keyword. text fields are loaded from the _source and we keep their ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for double checking, I learned something! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most of the nonUniqueRightKey... tests, the join key on the left is neatly sorted.
Maybe let's add an example where this is specifically not the case, where the join key is out of order and repeats?
We could construct a sequence of desired join keys via e.g.
ROW language_code = [1, 1, 7, 2, 3, 3] | MV_EXPAND language_code | LOOKUP JOIN languages_lookup_non_unique_key ON language_code
(Assuming that MV_EXPAND preserves the multivalue's order, which we technically don't document.)
Another example would be out of order and containing nulls and keys without corresponding right hand key; that won't work via MV_EXPAND, though (no nulls in MVs) but we could probably piece something together using the employees index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, still looking very good!
I still need to take a (final) look at some test classes, but so nothing major jumps out to me.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/AbstractLookupService.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/AbstractLookupService.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/AbstractLookupService.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another interesting test case would be when the left hand side join keys are all nulls. Since we optimize for null pages on the left hand side, a short end-to-end test would confirm that all is well in this code path.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/AbstractLookupService.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| protected static class LookupResponse extends AbstractLookupService.LookupResponse { | ||
| private List<Page> pages; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the response will be a singular list of pages. That's reasonable for now, although we could also stream back the pages - which could be interesting in cases where there are ridiculous amounts of matches on the right hand side for keys from the left.
I think that's edge-case-y enough not to worry about for now. But I think we'll want a heap attack test for this in a follow up - because attempting to send big responses over the wire can OOM rather than tripping a breaker.
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/AsyncOperator.java
Outdated
Show resolved
Hide resolved
| Page right = ongoing.itr.next(); | ||
| emittedPages++; | ||
| try { | ||
| return ongoing.join.join(right); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case this trips a breaker, do we need to manually release everything this operator is still holding onto? Including all of ongoing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we trip a breaker inside this call then:
joinwill release the Blocks it's working- we'll release
rightjust below this. - the exception will bubble up into the ThreadPool which will call the onFailure handler around
Driver:400which will calldrainAndCloseOperatorswhich will close everyOperator, one by one. That'll close theongoingmerge, releasing any pages still in theIterator.
That's some more stuff to shutdown the DriverContext in an orderly way that I don't have all loaded in my head, but, yeah, we release it all.
One thing I can do in this PR is to make an extension of OperatorTestCase for this. That has a ton of nice tests for all this, including one where it installs a breaker that randomly fails. That helps make sure we release things properly no matter what craziness happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd be lovely!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this requires quite a bit of juggling. Checking....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I'm going to hold this until a follow up change can get in that'll let us share more.
...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexOperator.java
Show resolved
Hide resolved
...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexOperator.java
Outdated
Show resolved
Hide resolved
|
I've replied to all comments as of this morning and believe there are three things left:
|
...ql/src/test/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexOperatorStatusTests.java
Outdated
Show resolved
Hide resolved
...ql/src/test/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexOperatorStatusTests.java
Outdated
Show resolved
Hide resolved
...l/src/test/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexServiceResponseTests.java
Outdated
Show resolved
Hide resolved
|
|
||
| @Override | ||
| protected LookupFromIndexService.LookupResponse createTestInstance() { | ||
| // Can't use a real block factory for the basic serialization tests because they don't release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: comment is very helpful
| return new NamedWriteableRegistry(List.of(IntBlock.ENTRY)); | ||
| } | ||
|
|
||
| public void testWithBreaker() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice test.
...l/src/test/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexServiceResponseTests.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/AsyncOperator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All finished now. Mostly minor remarks, so I think this already deserves a thumbs-up, as it could be merged as-is (maybe except for the copy-paste mistake in one of the tests, that tests the wrong status). Thanks a lot @nik9000 , this is great!
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/AbstractLookupService.java
Show resolved
Hide resolved
| Page right = ongoing.itr.next(); | ||
| emittedPages++; | ||
| try { | ||
| return ongoing.join.join(right); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd be lovely!
| ); | ||
| CheckedFunction<List<Page>, Page, Exception> handleResponse = pages -> { | ||
| if (pages.size() != 1) { | ||
| throw new UnsupportedOperationException("ENRICH should only return a single page"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Urgh, you're right. I wrongly assumed that we had to explicitly inherit from ElasticSearchException to get a status code other than 500, but apparently IllegalArgumentException is getting caught specifically and results in a 400.
I think we introduced (Es)qlIllegalArgumentException for this, which returns a 500. But we also don't use that consistently, I think. There's also IllegalStateException which returns 500.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested the PR a bit. I have an edgy-casey scenario where both the main index and the lookup index have a text field with a keyword subfield (the default mapping when a document is being added to an index without a mapping for that field).
Because it was easier for me, I added 3 docs to employees and two docs to a lookup index like this:
{"index":{"_index":"test1"}}
{"counter": 11, "text2":"red fox jumps over black dog"}
{"index":{"_index":"test1"}}
{"counter": 12, "text2":"black dog"}
{"index":{"_index":"employees"}}
{"text2":"red fox jumps over black dog", "emp_no":555}
{"index":{"_index":"employees"}}
{"text2":"red fox", "emp_no":666}
{"index":{"_index":"employees"}}
{"text2":"black dog", "emp_no":777}
The query FROM employees | where emp_no < 1000 | LOOKUP JOIN test1 ON text2.keyword | KEEP emp_no, counter, text2 returns something like this:
emp_no | counter | text2
---------------+---------------+----------------------------
555 |11 |red fox jumps over black dog
666 |null |null
777 |12 |black dog
while the query with a lookup on the text field itself it doesn't seem to match the values (counter values all come null and I assume that the values are not picked up from the lookup index):
emp_no | counter | text2
---------------+---------------+----------------------------
555 |null |red fox jumps over black dog
666 |null |red fox
777 |null |black dog
| 10002 | 2 | German | ||
| 10002 | 2 | German | ||
| 10002 | 2 | German | ||
| 10003 | null | null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked at the languages_non_unique_key.csv file and I was intrigued by the missing key (language_code) in the last two records there. My first reaction was "why is language_name null and not all the values corresponding to the two records in languages_non_unique_key.csv?". I guess we have an explanation for that and there is no reason for concern :-).
|
What I missed in mentioning is the following slight inconsistency. Considering the above additional field added to |
|
Adding the new test case needs #120223 |
|
OK. Rereading @astefan's comment it seems to be quite a bit about text fields and their "funny" matching characteristics for lookup. For now we're forbidding them. @craigtaverner is working on that. |
💔 Backport failed
You can use sqren/backport to manually backport by running |
|
Backport coming by hand |
This makes `LOOKUP` return multiple rows if there are multiple matches. This is the way SQL works so it's *probably* what folks will expect. Even if it isn't, it allows for more optimizations. Like, this change doesn't optimize anything - it just changes the behavior. But there are optimizations you can do *later* that are transparent when we have *this* behavior, but not with the old behavior. Example: ``` - 2 | [German, German, German] | [Austria, Germany, Switzerland] + 2 | German | [Austria, Germany] + 2 | German | Switzerland + 2 | German | null ``` Relates: elastic#118781
|
backport is #120232 |
* ESQL: Compute infrastruture for LEFT JOIN (#118889) This adds some infrastructure that we can use to run LOOKUP JOIN using real LEFT JOIN semantics. Right now if LOOKUP JOIN matches many rows in the `lookup` index we merge all of the values into a multivalued field. So the number of rows emitted from LOOKUP JOIN is the same as the number of rows that comes into LOOKUP JOIN. This change builds the infrastructure to emit one row per match, mostly reusing the infrastructure from ENRICH. * ESQL: Make LOOKUP more left-joiny (#119475) This makes `LOOKUP` return multiple rows if there are multiple matches. This is the way SQL works so it's *probably* what folks will expect. Even if it isn't, it allows for more optimizations. Like, this change doesn't optimize anything - it just changes the behavior. But there are optimizations you can do *later* that are transparent when we have *this* behavior, but not with the old behavior. Example: ``` - 2 | [German, German, German] | [Austria, Germany, Switzerland] + 2 | German | [Austria, Germany] + 2 | German | Switzerland + 2 | German | null ``` Relates: #118781
The PR for this is at #119473 |
Does it "solve" Andrei's concerns? I just don't want to lose his comment. |
This makes
LOOKUPreturn multiple rows if there are multiple matches. This is the way SQL works so it's probably what folks will expect. Even if it isn't, it allows for more optimizations. Like, this change doesn't optimize anything - it just changes the behavior. But there are optimizations you can do later that are transparent when we have this behavior, but not with the old behavior.Example:
Relates: #118781