Skip to content

Conversation

@bpintea
Copy link
Contributor

@bpintea bpintea commented Apr 30, 2025

This adds a new logical optimization rule to purge a Join in case the merge key(s) are null. The null detection is based on recognizing a tree pattern where the join sits atop a project and/or eval (possibly a few nodes deep) which contains a reference to a null, reference which matches the join key.

It works at coordinator planning level, but it's most useful locally, after insertions of nulls in the plan on detecting missing fields.

The Join is substituted with a projection with the same attributes as the join, atop an eval with all join's right fields aliased to null.

Closes #125577.

bpintea added 2 commits April 30, 2025 19:38
This adds a new logical optimization rule to purge a Join in case the
merge key(s) are null. The null detection is based on recognizing a tree
pattern where the join sits atop a project and/or eval which contains a
reference to a null, reference which matches the join key.

It works at coordinator planning level, but it's most useful locally,
after insertions of nulls in the plan on detecting missing fields.
@bpintea bpintea added >enhancement auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v8.19.0 v9.1.0 labels Apr 30, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 30, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @bpintea, I've created a changelog YAML for you.

@Override
public String nodeString() {
return child.nodeString() + " AS " + name();
return child.nodeString() + " AS " + name() + "#" + id();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly related, but not sure why we wouldn't include the id, it's easier to track which exactly reference points to an alias.

@alex-spies alex-spies self-requested a review May 2, 2025 09:35
@bpintea bpintea requested review from astefan and costin May 2, 2025 21:07
var joinType = join.config().type();
if (joinType == INNER || joinType == LEFT) { // other types will have different replacement logic
AttributeMap.Builder<Expression> attributeMapBuilder = AttributeMap.builder();
loop: for (var child = join.left();; child = ((UnaryPlan) child).child()) { // cast is safe as both plans are UnaryPlans
Copy link
Contributor

@astefan astefan May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this loop you are collecting all the aliases that are contained in every Project and Eval on the left hand side of the LOOKUP until a LogicalPlan of a different type is found. Are there cases where evals and projects were not merged together already by CombineEvals and CombineProjections, respectively?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there cases where evals and projects were not merged together already by CombineEvals and CombineProjections, respectively?

Yes; but:

  • if the plan goes through multiple transformations, that's OK, this rule will apply apply as soon as the pattern Join - Project/Eval is detected; and the plan will converge to a stable state eventually.
  • the plan is most useful on the data node, where the plan has stabilised already (before taking into account the local conditions, that is).

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bpintea !

I focused on the actual optimizer rule, as I think @astefan already reviewed more deeply.

I left some remarks that I think should be addressed, but could yield a good optimizer rule. That said, I think there is potential for simplifying this together with other optimizer rules, namely ReplaceMissingFieldsWithNull and PropagateEvalFoldables, and this would hinge more on placing literals (null or not) into the join config.

I'll reach out offline to discuss this.

import static org.elasticsearch.xpack.esql.plan.logical.join.JoinTypes.LEFT;

/**
* The rule matches a plan pattern having a Join on top of a Project and/or Eval. It then checks if the join's performed on a field which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking for this pattern is restrictive IMO, but it's also not what the rule actually does. Leftover?

Also, this is repeating some of the logic of PropagateEvalFoldables. Could they share code? That one collects aliases from the plan when they point to literals (via potentially several indirections), which this rule also does. But PropagateEvalFoldables does only 1 pass to collect all aliases, while this rule descends back into the children whenever it finds a join, forgetting about the previous resolutions.

PropagateEvalFoldables has a boolean called shouldCollect. That's looks like it could become a more general predicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking for this pattern is restrictive IMO

It is more restrictive than I'd wish for it to be more effective, indeed, but I found no other way to detect null join keys (in stabilised plans, after all transformations). Any other command doing data transformations would need executing or "invasive" analysis to determine if a null just passes through. But happy to apply modifications if I overlooked solutions.
Non-data-transforming commands are pushed out of the way (a.t.p.).

but it's also not what the rule actually does. Leftover?

No, I think that's actually what the rule does; but maybe I misunderstood the question? :)

this is repeating some of the logic of PropagateEvalFoldables. Could they share code?

The refs collection in this rule is more restrictive than that in PropagateEvalFoldables and operates on few node types. I guess there might be a way to share code, but not sure it'll make it more legible.

But PropagateEvalFoldables does only 1 pass to collect all aliases, while this rule descends back into the children whenever it finds a join, forgetting about the previous resolutions.

Right; that's because it can only apply this rule on a specific tree pattern; if there were more nodes in-between (of different types), the rule wouldn't work.

PropagateEvalFoldables has a boolean called shouldCollect. That's looks like it could become a more general predicate?

The tree traversing would be different, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking for this pattern is restrictive IMO, but it's also not what the rule actually does. Leftover?

Sorry, you are correct; I didn't notice the break in the switch below and thought we'd just go over all children but looking only for evals and projections in it.

The refs collection in this rule is more restrictive than that in PropagateEvalFoldables and operates on few node types. I guess there might be a way to share code, but not sure it'll make it more legible.

My point is that both rules look for chains of aliases and propagate the result into a command that (transitively) depneds on a literal. I think we should have only 1 way to do this; if the approach is correct for PropagateEvalFoldables, it should also be correct here - and if it's not, then PropagateEvalFoldables probably has a bug and we need to find a different solution.

Conceptually, the difference is that PropagateEvalFoldables just places a literal in the downstream command - while we rather wish to prune it - but that could be solved either by to place a literal into the join config (which I think is interesting anyway because that allows even more optimizations).

}
}
}
for (var attr : AttributeSet.of(join.config().matchFields())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this shouldn't go over the match fields, but over the left + right fields. I don't trust the match fields right now, as their contract is never enforced and they only exist because our Join modeling got wonky.

That said, for the current implementation it's probably correct: only for the left fields can we ever know that they will be null before running the execution.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unblocking because the ClassCastException that I thought I saw can't actually happen. My bad.

I'd prefer if we try consolidating the alias propagation logic with that from PropagateEvalFoldables and shared code. I believe this will be less brittle over time and sharing code will mean we have 1 mechanism to keep correct rather than 2.

However, the added tests and the initial approach already are an improvement in itself, so there's no reason to hold this PR.

I won't be available to review more deeply - could you please continue iterating with @astefan ?(Who already reviewed more precisely, anyway - thanks Andrei!)

@bpintea
Copy link
Contributor Author

bpintea commented May 20, 2025

I'd prefer if we try consolidating the alias propagation logic with that from PropagateEvalFoldables and shared code. I believe this will be less brittle over time and sharing code will mean we have 1 mechanism to keep correct rather than 2.

Pushed an update to reuse code from both PropagateEvalFoldables and ReplaceMissingFieldWithNull.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

protected LogicalPlan rule(Join join, LogicalOptimizerContext ctx) {
LogicalPlan plan = join;
if (join.config().type() == LEFT) { // other types will have different replacement logic
AttributeMap<Expression> attributeMap = PropagateEvalFoldables.foldableReferences(join, ctx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move PropagateEvalFoldables#foldableReferences + ReplaceMissingFieldWithNull#aliasedNulls to a separate utils class (RuleUtils) to avoid introducing unnecessary dependencies between the rules themselves.
Likely

@bpintea bpintea added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label May 26, 2025
@elasticsearchmachine elasticsearchmachine merged commit 21fe40a into elastic:main May 27, 2025
18 checks passed
@bpintea bpintea deleted the enh/drop_join_on_null_merge_key branch May 27, 2025 13:38
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 127583

@bpintea bpintea added v9.0.3 and removed v9.0.3 labels May 30, 2025
elasticsearchmachine pushed a commit that referenced this pull request Jun 2, 2025
…8733)

This adds a new logical optimization rule to purge a Join in case the
merge key(s) are null. The null detection is based on recognizing a tree
pattern where the join sits atop a project and/or eval (possibly a few
nodes deep) which contains a reference to a `null`, reference which
matches the join key.

It works at coordinator planning level, but it's most useful locally,
after insertions of `nulls` in the plan on detecting missing fields.

The Join is substituted with a projection with the same attributes as
the join, atop an eval with all join's right fields aliased to null.

Closes #125577.

(cherry picked from commit 21fe40a)
Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came out very nicely, thanks @bpintea !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESQL: Skip LOOKUP JOIN when join key is missing from index

6 participants