-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Pushdown Lookup Join past Project #129503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 19 commits
ab3f000
6efc938
c11e139
c2c401b
1d830ac
9f30384
1ce823a
83978b0
4610340
2d6bfd8
896d406
1056e37
1dd89d1
ce38db5
ac13587
21b7bde
a9f3ac3
4de2fc2
c04ca94
b42e30a
ffa8b99
0512094
50f864d
3ece38e
d5ba763
c8edad4
dc5c91e
1f82f93
fb13584
2b02b6f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| pr: 129503 | ||
| summary: Pushdown Lookup Join past Project | ||
| area: ES|QL | ||
| type: enhancement | ||
| issues: | ||
| - 119082 |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,119 @@ | ||||||
| /* | ||||||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||||||
| * or more contributor license agreements. Licensed under the Elastic License | ||||||
| * 2.0; you may not use this file except in compliance with the Elastic License | ||||||
| * 2.0. | ||||||
| */ | ||||||
|
|
||||||
| package org.elasticsearch.xpack.esql.optimizer.rules.logical; | ||||||
|
|
||||||
| import org.elasticsearch.xpack.esql.core.expression.Alias; | ||||||
| import org.elasticsearch.xpack.esql.core.expression.Attribute; | ||||||
| import org.elasticsearch.xpack.esql.core.expression.AttributeMap; | ||||||
| import org.elasticsearch.xpack.esql.core.expression.AttributeSet; | ||||||
| import org.elasticsearch.xpack.esql.core.expression.Expression; | ||||||
| import org.elasticsearch.xpack.esql.core.expression.Expressions; | ||||||
| import org.elasticsearch.xpack.esql.core.expression.NamedExpression; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.Eval; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.LogicalPlan; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.Project; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.join.InlineJoin; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.join.Join; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.join.JoinTypes; | ||||||
|
|
||||||
| import java.util.ArrayList; | ||||||
| import java.util.HashSet; | ||||||
| import java.util.List; | ||||||
| import java.util.Set; | ||||||
|
|
||||||
| /** | ||||||
| * If a {@link Project} is found in the left child of a left {@link Join}, perform it after. Due to requiring the projected attributes | ||||||
| * later, field extractions can also happen later, making joins cheapter to execute on data nodes. | ||||||
| * E.g. {@code ... | RENAME field AS otherfield | LOOKUP JOIN lu_idx ON key} | ||||||
| * becomes {@code ... | LOOKUP JOIN lu_idx ON key | RENAME field AS otherfield }. | ||||||
| * When a {@code LOOKUP JOIN}'s lookup fields shadow the previous fields, we may need to leave an {@link Eval} in place to assign a | ||||||
| * temporary name. Assume that {@code field} is a lookup field, then {@code ... | RENAME field AS otherfield | LOOKUP JOIN lu_idx ON key} | ||||||
alex-spies marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
| * becomes something like {@code ... | EVAL $$field = field | LOOKUP JOIN lu_idx ON key | RENAME $$field AS otherfield}. | ||||||
| * Leaving {@code EVAL $$field = field} in place of the original projection, rather than a Project, avoids infinite loops. | ||||||
| */ | ||||||
| public final class PushDownJoinPastProject extends OptimizerRules.OptimizerRule<Join> { | ||||||
| @Override | ||||||
| protected LogicalPlan rule(Join join) { | ||||||
| if (join instanceof InlineJoin) { | ||||||
alex-spies marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| // Do not apply to INLINESTATS; this rule could be expanded to include INLINESTATS, but the StubRelation refers to the left | ||||||
| // child - so pulling out a Project from the left child would require us to also update the StubRelation (and the Aggregate | ||||||
| // on top of it) | ||||||
| return join; | ||||||
| } | ||||||
|
|
||||||
| if (join.left() instanceof Project project && join.config().type() == JoinTypes.LEFT) { | ||||||
| AttributeMap.Builder<Expression> aliasBuilder = AttributeMap.builder(); | ||||||
| project.forEachExpression(Alias.class, a -> aliasBuilder.put(a.toAttribute(), a.child())); | ||||||
| var aliasesFromProject = aliasBuilder.build(); | ||||||
|
|
||||||
| // Propagate any renames into the Join, as we will remove the upstream Project. | ||||||
| // E.g. `RENAME field AS key | LOOKUP JOIN idx ON key` -> `LOOKUP JOIN idx ON field | ...` | ||||||
| Join updatedJoin = PushDownUtils.resolveRenamesFromMap(join, aliasesFromProject); | ||||||
|
|
||||||
| // Construct the expressions for the new downstream Project using the Join's output. | ||||||
| // We need to carry over RENAMEs/aliases from the original upstream Project. | ||||||
| List<Attribute> originalOutput = join.output(); | ||||||
| List<NamedExpression> newProjections = new ArrayList<>(originalOutput.size()); | ||||||
| for (Attribute attr : originalOutput) { | ||||||
| Attribute resolved = (Attribute) aliasesFromProject.resolve(attr, attr); | ||||||
| if (attr.semanticEquals(resolved)) { | ||||||
| newProjections.add(attr); | ||||||
| } else { | ||||||
| Alias renamed = new Alias(attr.source(), attr.name(), resolved, attr.id(), attr.synthetic()); | ||||||
| newProjections.add(renamed); | ||||||
| } | ||||||
| } | ||||||
|
|
||||||
| // This doesn't deal with name conflicts yet. Any name shadowed by a lookup field from the `LOOKUP JOIN` could still have been | ||||||
| // used in the original Project; any such conflict needs to be resolved by copying the attribute under a temporary name via an | ||||||
| // Eval - and using the attribute from said Eval in the new downstream Project. | ||||||
| Set<String> lookupFieldNames = new HashSet<>(Expressions.names(join.rightOutputFields())); | ||||||
| List<NamedExpression> finalProjections = new ArrayList<>(newProjections.size()); | ||||||
| AttributeMap.Builder<Alias> aliasesForReplacedAttributesBuilder = AttributeMap.builder(); | ||||||
| AttributeSet leftOutput = project.child().outputSet(); | ||||||
|
|
||||||
| for (NamedExpression proj : newProjections) { | ||||||
alex-spies marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
| // TODO: add assert to Project that ensures Alias to attr or pure attr. | ||||||
| Attribute coreAttr = (Attribute) (proj instanceof Alias as ? as.child() : proj); | ||||||
|
||||||
| Attribute coreAttr = (Attribute) (proj instanceof Alias as ? as.child() : proj); | |
| Attribute coreAttr = (Attribute) Alias.unwrap(proj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, even better, maybe skip iterating twice over join's output attributes -- do all in one single loop? You then won't need to test for unwrapped object (here and below).
Edit: and the cast won't be necessary anymore (now it feels like it would deserve a comment as to why it's safe).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The single loop would get rid of the unwrapping and casting, but that makes the handling of shadowing less isolated - which I find harder to explain nicely.
alex-spies marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
alex-spies marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
alex-spies marked this conversation as resolved.
Show resolved
Hide resolved
|
Uh oh!
There was an error while loading. Please reload this page.