Skip to content

Conversation

@astefan
Copy link
Contributor

@astefan astefan commented Aug 14, 2025

Addresses #124741

@astefan astefan force-pushed the filters_on_inlinestats branch from 35b85ac to d852cda Compare August 14, 2025 15:03
@astefan astefan requested review from alex-spies and bpintea August 19, 2025 07:05
@astefan astefan marked this pull request as ready for review August 19, 2025 07:05
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 19, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @astefan, I've created a changelog YAML for you.

10003 |Parto |null |0 |null |M |1 |4
;

prunedInlinestatsFollowedByInlinestats_GroupByOneFieldOnSecondInlinestats-Ignore
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be as a follow up investigation. I did try to fix it several times, ran out of good ideas.

null |[0, 0, 0, 0] |[Berni, Chirstian, Amabile, Berni, Bojan, Chirstian, Anneke, Chirstian]|[Head Human Resources, Reporting Analyst, Support Engineer, Tech Lead]|10004
;

prunedInlinestatsFollowedByinlinestats-Ignore
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be as a follow up investigation. I did try to fix it several times, ran out of good ideas. This is in the same ballpark failure set as the other one.

emp_no:integer
5
;
///////////////////////////
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all the tests below have been ported from the same set of stats tests with filters and adapted to inlinestats.

protected LogicalPlan rule(LogicalPlan plan) {
Aggregate aggregate;
Holder<InlineJoin> ij = new Holder<>();
if (plan instanceof Aggregate a) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code block, in a nutshell, finds out if the Aggregate comes from inlinejoin or is a standalone one.

return plan; // not an Aggregate or InlineJoin, nothing to do
}

if (aggregate != null) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here is the same as before, but now wrapped in this condition related to aggregate being null or not.

plan = new Project(aggregate.source(), plan, newProjections);
if (newEvals.isEmpty() == false) {
if (newAggs.isEmpty()) { // the Aggregate node is pruned
if (ij.get() != null) { // this is an Aggregate part of right-hand side of an InlineJoin
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meat of the PR. Basically, a new "branch" in dealing with pruning Aggs in InlineJoin specifically.

// project the correct output (the one of the former inlinejoin) and remove the InlineJoin altogether,
// replacing it with its right-hand side followed by its left-hand side
plan = new Project(newIJ.source(), newIJ.right(), newIJ.output());
} else { // this is a standalone Aggregate
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same code as before.

return newPlan;
});
} else { // this is a standalone Aggregate
plan = aggregate.with(aggregate.child(), aggregate.groupings(), newAggs);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same code as before.

plan = localRelation(aggregate.source(), newEvals);
}
} else {
if (ij.get() != null) { // this is an Aggregate part of right-hand side of an InlineJoin
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meat of the PR (specific handling for InlineJoin).

assertThat(Expressions.names(agg.aggregates()), contains("sum(salary)", "sum(salary) WheRe last_name == \"Doe\""));
}

/**
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved these to a new class, specific to ReplaceStatsFilteredAggWithEval rule.

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left mostly minor notes, except for one.

List<NamedExpression> newProjections = new ArrayList<>(oldAggSize);
protected LogicalPlan rule(LogicalPlan plan) {
Aggregate aggregate;
Holder<InlineJoin> ij = new Holder<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use a simple variable here, you don't need a holder. IIUC, there's a single place where the var would be used in a lambda, you may assign it locally in the respective branch (or right after this detection block) to make it effectively final.

Holder<InlineJoin> ij = new Holder<>();
if (plan instanceof Aggregate a) {
aggregate = a;
} else if (plan instanceof InlineJoin) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if (plan instanceof InlineJoin) {
} else if (plan instanceof InlineJoin inlineJoin) {

to no longer need to cast.

Comment on lines 56 to 60
ij.get().right().forEachDown(p -> {
if (aggHolder.get() == null && p instanceof Aggregate a) {
aggHolder.set(a);
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ij.get().right().forEachDown(p -> {
if (aggHolder.get() == null && p instanceof Aggregate a) {
aggHolder.set(a);
}
});
inlineJoin.right().forEachDown(Aggregate.class, aggHolder::setIfAbsent);

if (newAggs.isEmpty()) { // the Aggregate node is pruned
if (ij.get() != null) { // this is an Aggregate part of right-hand side of an InlineJoin
// replace the right hand side Aggregate with an Eval
InlineJoin newIJ = new InlineJoin(ij.get().source(), ij.get().left(), ij.get().right().transformDown(p -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
InlineJoin newIJ = new InlineJoin(ij.get().source(), ij.get().left(), ij.get().right().transformDown(p -> {
InlineJoin newIJ = inlineJoin.replaceRight(inlineJoin.right().transformDown(p -> {

with inlineJoin the effectively final var.

} else {
if (ij.get() != null) { // this is an Aggregate part of right-hand side of an InlineJoin
// only update the Aggregate and add an Eval for the removed aggregations
plan = ij.get().transformUp(Aggregate.class, agg -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I think this'll walk the entire (sub)tree of InlineJoin, while an update is really only possible on the right hand-side. Would it be worth applying the change there only?

Comment on lines 700 to 701
| inlinestats max = max(salary), max_a = max(salary) where null,
min = min(salary), min_a = min(salary) where to_string(null) == "abc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aesthetic:

Suggested change
| inlinestats max = max(salary), max_a = max(salary) where null,
min = min(salary), min_a = min(salary) where to_string(null) == "abc"
| inlinestats max = max(salary),
max_a = max(salary) where null,
min = min(salary),
min_a = min(salary) where to_string(null) == "abc"

as(limit.child(), EsRelation.class);
}

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/**
/*

to prevent having the block be interpreted as javadoc. Here and above, where there's no @code wrapping (which isn't ideal, as it's wrapped in IntelliJ; but that was just copied).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to start new files (started working on splitting the mighty x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java, but it'd be good to allocate a packet for it, since there'll be many "logical plan optimiser tests" (vs local ~ etc.).
But anyways, this can be added as is and I'll incorporate this later in that work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note though that this approach (separate package only for `local) is the same in the non-test package.

}), ij.get().config());
// project the correct output (the one of the former inlinejoin) and remove the InlineJoin altogether,
// replacing it with its right-hand side followed by its left-hand side
plan = new Project(newIJ.source(), newIJ.right(), newIJ.output());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Project is what's basically returned. Can't the entire InlineJoin transformation be skipped and only transform with the right hand-side of the "old" InlineJoin?

} else if (p instanceof StubRelation) {
// Remove the StubRelation since the right-hand side of the join is now part of the main plan
// and it won't be executed separately by the EsqlSession inlinestats planning.
p = ij.get().left();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite positive, but this might be the issue for a stack overflow with:

from employees
| inlinestats count = count(*) where false 
| inlinestats cc = count_distinct(emp_no) where false 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch @bpintea.
This raises an interesting discussion point: the fact that the right-hand side of a InlineJoin must contain a single StubRelation and that anything that has to do with StubRelations has to happen only once per InlineJoin right-hand side. I've made some changes to InlineJoin because of this.

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
I'd update the PR description (and title?), as it's not really implementing the support, only optimising that.

doneReplacing.set(true);
return up.replaceChild(stubReplacement);
}
assert false : "Expected to replace a single StubRelation in the plan, but found more than one";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better throw here:

  1. this will bring down the whole node in CI (and fail many other unrelated tests).
  2. it will do nothing in production: better fail the query.


/**
* Replaces the stubbed source with the actual source.
* NOTE: this will replace all {@link StubRelation}s found with the source and the method is meant to be used to replace one node only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* NOTE: this will replace all {@link StubRelation}s found with the source and the method is meant to be used to replace one node only
* NOTE: this will replace the first {@link StubRelation}s found with the source and the method is meant to be used to replace one node only

return stubbed.transformUp(UnaryPlan.class, up -> {
Holder<Boolean> doneReplacing = new Holder<>(false);
var result = stubbedPlan.transformUp(UnaryPlan.class, up -> {
if (up.child() instanceof StubRelation) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a change strictly related to the PR, but this assumes there will never be an InlineJoin having a right child be a StubRelation. I'm not sure if this can occur now, but ideally we wouldn't rely on it, an intermediary state of the optimisations could theoretically produce it.
I.e. it's not an invariant.

return up;
});

assert doneReplacing.get() : "Expected to replace a single StubRelation in the plan, but none found";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe same here, we could throw, instead of an assert?

Comment on lines 89 to 96
var newRight = ij.right().transformDown(Aggregate.class, agg -> {
LogicalPlan p = agg;
if (agg == aggregate) {
// the aggregate becomes a simple Eval since it's not needed anymore (it was replaced with Literals)
p = new Eval(aggregate.source(), aggregate.child(), newEvals);
}
return p;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional, but would be symmetrical to the other branch:

Suggested change
var newRight = ij.right().transformDown(Aggregate.class, agg -> {
LogicalPlan p = agg;
if (agg == aggregate) {
// the aggregate becomes a simple Eval since it's not needed anymore (it was replaced with Literals)
p = new Eval(aggregate.source(), aggregate.child(), newEvals);
}
return p;
});
// the aggregate becomes a simple Eval since it's not needed anymore (it was replaced with Literals)
var newRight = ij.right()
.transformDown(
Aggregate.class,
agg -> agg == aggregate ? new Eval(aggregate.source(), aggregate.child(), newEvals) : agg
);

10010 |null |4 |6.949207097931448 |7.127229475750027|27921.220736207077|10
;

twoConsecutiveInlinestatsWithFalseFiltes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd maybe add this part of the unit tests (ReplaceStatsFilteredAggWithEvalTests). It's interesting, as it produces this plan:

    EsqlProject[[emp_no{f}#9, count{r}#5, cc{r}#8]]
    \_Eval[[0[LONG] AS count#5, 0[LONG] AS cc#8]]
      \_TopN[[Order[emp_no{f}#9,ASC,LAST]],3[INTEGER]]
        \_Limit[3[INTEGER],false]
          \_EsRelation[test][_meta_field{f}#15, emp_no{f}#9, first_name{f}#10, g..]

The TopN-Limit is probably avoidable, though it's not produced part of this change, it's the way we handle limit | sort | limit. But up to you.

@astefan astefan merged commit b968ec9 into elastic:main Aug 26, 2025
33 checks passed
@astefan astefan deleted the filters_on_inlinestats branch August 26, 2025 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants