Skip to content

Conversation

@alex-spies
Copy link
Contributor

Fix #137019

@alex-spies alex-spies added >bug auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.2.1 v9.3.0 v8.19.7 v9.1.7 labels Oct 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @alex-spies, I've created a changelog YAML for you.

@alex-spies alex-spies marked this pull request as ready for review October 24, 2025 11:40
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 24, 2025
@bpintea bpintea self-requested a review October 24, 2025 13:02
Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, interesting it didn't surface earlier.
I've only left style optional notes.

import static org.elasticsearch.xpack.esql.optimizer.rules.logical.TemporaryNameUtils.locallyUniqueTemporaryName;

/**
* Replace aliasing evals (eval x=a) with a projection which can be further combined / simplified.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we require project on top? It seems we are missing a lot of optimization opportunities.
e.g. maybe we will duplicate a billion doc field many times for this ES|QL? I tested with logical planning, not sure if we have rules later to handle this.

from test
| EVAL salary = salary+1, salary = salary +1, salary = salary +1
Eval[[salary{f}#17 + 1[INTEGER] AS salary#5, salary{r}#5 + 1[INTEGER] AS salary#8, salary{r}#8 + 1[INTEGER] AS sala
ry#11]]
\_Limit[1000[INTEGER],false,false]
  \_EsRelation[test][_meta_field{f}#18, emp_no{f}#12, first_name{f}#13, ..]

Copy link
Contributor

@julian-elastic julian-elastic Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example above should be optimized to just this, no project needed

from test
| EVAL salary = salary+3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule doesn't propagate shadowed, internal columns from an eval. We could do that! But I don't think we do.
I expect this is what you'd want this to become?

from test
| EVAL salary = ((salary+1)+1)+1

(Which should be simplified by some other rule to salary+3, I think.)

Why do we require project on top?

Great question! That rule is super old, and I don't recall why we don't trigger it always. My hunch is that we wanted it mostly to combine the aliases from the eval with downstream projections. But we could profit from propagating the aliases more generally.

That said, on its own, there is no performance difference between a simple alias in an eval vs. in a projection. Both are cheap! They just incRef the underlying block. (Unless that block is sent over the wire. But that could also be tackled on the serialization level.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would expect us to get to salary = ((salary+1)+1)+1 and then fold the constants in evals eventually. You don't have to address it in this PR, it seems like a bigger change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a separate approach to this optimizer rule, which would inline all eval expressions in a first step, and then can very simply extract any simple renames into a project.

This would side step all the shadowing shenanigans and would address your comment below.

It would have interesting behavior because, in a way, it'd do the opposite of extracting common expressions in EVALs, which we may want to implement in the future. E.g. EVAL x = to_lower(y), z1 = length(x), z2 = starts_with("foo", x) - this eval re-uses x and inlining it into z1 and z2 would make us re-compute it unnecessarily. OTOH, this would allow for simplifications like simplifying salary = salary+1, salary = salary +1, salary = salary +1 into salary+3.

I'm not sure we want to jump on this right now, but maybe it'll become useful in the future, or if we find that our eval expressions bottleneck queries and need to be optimized better.

Copy link
Contributor

@julian-elastic julian-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule is pretty complicated, I feel like it tries to do too much at the same time and the same function. I wonder if we can refactor it somehow, e.g. have a method that does just the aliasing, have a method that does just the renaming, have a method that builds the final output. The changes themselves seem correct. Also added some testing recommendations in the comments.

@alex-spies
Copy link
Contributor Author

I wonder if we can refactor it somehow, e.g. have a method that does just the aliasing, have a method that does just the renaming, have a method that builds the final output.

I think a multi-pass approach could be easier to read, but would likely take another day to refactor :/ Conceptually, we could ignore shadowing in a first pass and just turn the Eval into a EVAL | PROJECT (with potentially broken dependencies between the two commands), based just on the original Eval's output, where the Project has all the renames baked in. And then, in a second pass, we could check which attributes that the Project still needs are shadowed by the Eval, and thus need to be renamed in the Eval (and, accordingly, referred to differently by the Project).

It's still complex though :/ The root problem is that our optimizer rules need to deal with shadowing in the first place, because our name conflict resolution still is always taken into account in LogicalPlans, not only during the initial resolution.

If you see a nicer way that'd work, feel free to go for it! For now, I just need this to be correct so that I can unmute the generative tests again.

@alex-spies
Copy link
Contributor Author

Thanks for the reviews @bpintea and @julian-elastic !

@alex-spies alex-spies enabled auto-merge (squash) October 28, 2025 17:19
@alex-spies alex-spies merged commit 386b156 into elastic:main Oct 28, 2025
34 checks passed
alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Oct 28, 2025
…c#137025)

Fix elastic#137019: a bug that happened when the Eval has (non-aliasing) fields that happen to overwrite the attributes that we try to alias in a subsequent Project.
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.2
8.19 Commit could not be cherrypicked due to conflicts
9.1 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 137025

elasticsearchmachine pushed a commit that referenced this pull request Oct 28, 2025
… (#137286)

Fix #137019: a bug that happened when the Eval has (non-aliasing) fields that happen to overwrite the attributes that we try to alias in a subsequent Project.
alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Oct 29, 2025
…c#137025)

Fix elastic#137019: a bug that happened when the Eval has (non-aliasing) fields that happen to overwrite the attributes that we try to alias in a subsequent Project.

(cherry picked from commit 386b156)

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/eval.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
@alex-spies
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
9.1
8.19

Questions ?

Please refer to the Backport tool documentation

alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Oct 29, 2025
…c#137025)

Fix elastic#137019: a bug that happened when the Eval has (non-aliasing) fields that happen to overwrite the attributes that we try to alias in a subsequent Project.

(cherry picked from commit 386b156)

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/eval.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
elasticsearchmachine pushed a commit that referenced this pull request Oct 29, 2025
…137025) (#137318)

* ESQL: Fix ReplaceAliasingEvalWithProject in case of shadowing (#137025)

Fix #137019: a bug that happened when the Eval has (non-aliasing) fields that happen to overwrite the attributes that we try to alias in a subsequent Project.

(cherry picked from commit 386b156)

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/eval.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java

* Fix tests
elasticsearchmachine pushed a commit that referenced this pull request Oct 29, 2025
…137025) (#137316)

* ESQL: Fix ReplaceAliasingEvalWithProject in case of shadowing (#137025)

Fix #137019: a bug that happened when the Eval has (non-aliasing) fields that happen to overwrite the attributes that we try to alias in a subsequent Project.

(cherry picked from commit 386b156)

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/eval.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java

* Remove accidentally committed test from other PR

* Fix tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged backport pending >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.7 v9.1.7 v9.2.1 v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESQL: bug in ReplaceAliasingEvalWithProject: optimized incorrectly due to missing references

4 participants