-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Improve Expanding Lookup Join performance by pushing a filter to the right side of the lookup join #133166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Expanding Lookup Join performance by pushing a filter to the right side of the lookup join #133166
Conversation
Hi @julian-elastic, I've created a changelog YAML for you. |
eb7c5c5
to
7eae250
Compare
7eae250
to
2aa2b49
Compare
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @julian-elastic, I've created a changelog YAML for you. |
d13da18
to
5405235
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a partial first pass, leaving mostly cosmetic notes and a couple of questions.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/EsqlFlags.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/PlannerUtils.java
Outdated
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
Outdated
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownAndCombineFilters.java
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownAndCombineFilters.java
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownAndCombineFilters.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownAndCombineFilters.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
We should maybe add a COALESCE
tests in the csv suite. There is a IS NOT NULL
currently, maybe there are some other lookup indices that can take an IS NULL
too? Either on the join field or "enriching" field.
Otherwise, I've left some other comments, but no blockers. We should consider pushing down the conditions on the common field in LOOKUP JOIN too. Doesn't have to be now.
Also, let's wait on Alex's review too.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/EsqlFlags.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownAndCombineFilters.java
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownAndCombineFilters.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownAndCombineFilters.java
Show resolved
Hide resolved
...mpute/src/main/java/org/elasticsearch/compute/operator/lookup/EnrichQuerySourceOperator.java
Outdated
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexService.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/ExpressionQueryList.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/ExpressionQueryList.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownAndCombineFilters.java
Show resolved
Hide resolved
...heap-attack/src/javaRestTest/java/org/elasticsearch/xpack/esql/heap_attack/HeapAttackIT.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heya! This looks very good, thanks for the iterations @julian-elastic .
I realized that the current version swallows some exceptions which I don't think is how we generally process them. (See my remarks below.) I'd like to discuss this before merging. Otherwise, I have only non-blocking comments.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/Join.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/EsqlFlags.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/ExpressionQueryList.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/ExpressionQueryList.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/BinaryExec.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/PlannerUtils.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/PlannerUtils.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/LocalMapper.java
Outdated
Show resolved
Hide resolved
...c/test/java/org/elasticsearch/xpack/esql/plan/physical/LookupJoinExecSerializationTests.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/BinaryExec.java
Show resolved
Hide resolved
ab90559
to
3116545
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @julian-elastic , this is cool!
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/ExpressionQueryList.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I thought the error swallowing was addressed in both cases; only saw now that we keep the error swallowing in the LookupFromIndexService. Not re-blocking as I think we're really on the last stretch here, but I think this should be addressed before merging.
I also am not convinced that there are any queries that actually trigger the serialization code for LookupJoinExec and, unless we can find one, think that the serialization of BinaryExec/LookupJoinExec should be simplified again, otherwise it will look like necessary complexity for our future selves, complicating future work and refactors there. If there are indeed such queries, we should have them as ITs (csv tests, ideally).
@julian-elastic , please proceed at your own discretion once these two points are addressed; this doesn't require re-review IMHO, so don't wait on me to get this merged.
Improve Expanding Lookup Join performance by pushing a filter to the right side of the lookup join.
As this is a performance optimization, we don't want to break the behavior for old nodes for CSS. The filter that we push down is optional and it is always reapplied after the lookup join. As a result if all nodes involved are new we will get performance benefits. Otherwise there might be partial or no performance benefits, but we will still execute the query successfully and get correct results if the query worked before this optimization.
With this PR ,we only handle Lucene pushable filters. Further optimizations are needed to get benefit by applying non-Lucene pushable filters and could be addressed separately in a later PR.
Changes to Rally-Tracks already added here elastic/rally-tracks#835, and we should see improvements once this is merged. Local results indicate around 90x improvement with the optimization for Lucene pushable filters on a test case that is specifically designed to demonstrate the benefits of this optimization. Customers are likely to see more limited benefits. The test case is an expanding lookup join of 100,000 rows table with 10,000 lookup table with filter of selectivity 0.1% (keeps 10 out of 10,000 rows of the lookup table). In the non-optimized version the filter is not pushed to the right, and we can get an explosion of records. We have 100,000 x10,000 = 1,000,000,000 rows after the join without the optimization. Then we filter then out to only 1,000,000 rows. With the optimization we apply the filter early so after the expanding join we only have 1,000,000 rows. This reduced max number of rows used by a factor of 1,000 and made the query 90 times faster.
Right Pushable filters with optimization
Running filtered join query...
test pushable join with filter on keyword: {"took":125,"documents_found":100000,"values":[[1000000]]}
test pushable join with filter on keyword: {"took":124,"documents_found":100000,"values":[[1000000]]}
test pushable join with filter on keyword: {"took":124,"documents_found":100000,"values":[[1000000]]}
test pushable join with filter on keyword: {"took":121,"documents_found":100000,"values":[[1000000]]}
test pushable join with filter on keyword: {"took":134,"documents_found":100000,"values":[[1000000]]}
Right Pushable filters without optimization
Running filtered join query...
test pushable join with filter on keyword: {"took":11315,"documents_found":100000,"values":[[1000000]]}
test pushable join with filter on keyword: {"took":11348,"documents_found":100000,"values":[[1000000]]}
test pushable join with filter on keyword: {"took":11330,"documents_found":100000,"values":[[1000000]]}
test pushable join with filter on keyword: {"took":11271,"documents_found":100000,"values":[[1000000]]}
test pushable join with filter on keyword: {"took":11258,"documents_found":100000,"values":[[1000000]]}
Script