ESQL: INLINESTATS implementation with multiple LogicalPlan updates #128917

astefan · 2025-06-04T15:54:20Z

Part of #124715 and similar to #128476.
Different from #128476 in that it takes a "LogicalPlan" approach to running a sub-query, integrating its result back in the "main" LogicalPlan and continuing running the query.

coordinated from the EsqlSession.

elasticsearchmachine · 2025-06-04T15:55:11Z

Hi @astefan, I've created a changelog YAML for you.

the inlinestats JOIN

…support_multi_inlinestats_logicalPlan_approach

astefan · 2025-06-12T11:38:06Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/inlinestats.csv-spec

 | INLINESTATS min_scalerank=MIN(scalerank) BY type
 | MV_EXPAND type
-| WHERE scalerank == MV_MIN(scalerank);
+| EVAL mvMin_scalerank = MV_MIN(scalerank)


I didn't know the original intention with this test, so I've updated it to make some kind of sense and to also keep what I thought to be its original purpose.

astefan · 2025-06-12T11:40:31Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/InlineJoin.java

-            // as first columns in the output followed by whatever the right hand side of join adds in this order: aggregates first,
-            // followed by groupings (this order should be preserved inside the rightFields() output)
-            output = mergeOutputAttributes(right, leftOutputWithoutMatchFields);
+            List<Attribute> leftOutputWithoutKeys = left.stream().filter(attr -> config().leftFields().contains(attr) == false).toList();


This is taken from @alex-spies great suggestion here

astefan · 2025-06-12T11:43:22Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java

                throw new EsqlIllegalArgumentException("unsupported join type [" + config.type() + "]");
            }

-            if (join instanceof InlineJoin) {


This temporary section has been removed; it was introduced with the first PR about reviving inlinestats, and it was also questioned about its usefulness. After changing the approach to use LogicalPlan rebuild (instead of PhysicalPlans) this part was not needed anymore.

…support_multi_inlinestats_logicalPlan_approach

elasticsearchmachine · 2025-06-12T12:45:38Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

alex-spies

Second round - focused on the execution in EsqlSession this time.

Looks like we're going back in the direction of the initial approach to inline stats, where each phase had a full optimizer run. That's the right approach IMO.

I want to have another look next week to better understand the changes to the optimizer rules. But please go ahead and merge whenever you're happy (I see Bogdan already 👍'd the PR); if I find something I find important, I'll leave a comment after the merge.

alex-spies · 2025-06-27T06:34:41Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

    }

-    private record PlanTuple(PhysicalPlan physical, LogicalPlan logical) {}
+    private record LogicalPlanTuple(LogicalPlan nonStubbedSubPlan, LogicalPlan originalSubPlan) {}


nit: a comment explaining the purpose wouldn't hurt; it's not clear without reading more what the non stubbed plan is, and the name LogicalPlanTuple is also rather generic.

alex-spies · 2025-06-27T06:48:28Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

-                });
+                LogicalPlan newLogicalPlan = optimizedPlan.transformUp(
+                    InlineJoin.class,
+                    ij -> ij.right() == subPlans.originalSubPlan ? InlineJoin.inlineData(ij, resultWrapper) : ij


Nice! I think it's important we use object equality here - regular equality will not suffice because it ignores e.g. name ids.

Maybe worth a comment?

alex-spies · 2025-06-27T06:59:47Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

+                    InlineJoin.class,
+                    ij -> ij.right() == subPlans.originalSubPlan ? InlineJoin.inlineData(ij, resultWrapper) : ij
+                );
+                newLogicalPlan.setOptimized();


This is surprising. Shouldn't we re-optimize the plan now that we were able to replace the stub with an actual result?

After the stub was replaced, we can actually do more stuff, like push down limits (which before that would be wrong as it would have affected the stats).

If I understand correctly, this is something we might want to improve later, right? If so, let's leave a comment; maybe a TODO to make the intention clear.

Yeah, a TODO is right.

I had my doubts with this thing. I feel like there are things here that were left hanging (things do still seem they can be optimized further), but at this stage of the inlinestats progress, I think it's worth ignoring it. But TODO is needed, because it's something we need to think about a little.

Hmm, I thought this was sufficient. The approach basically gradually executes righthand sides of (inline) joins, always resulting in a LocalRelation. Ending up with a join with whatever was before on the left and this local relation on the right. Not sure if we can further optimise this (and haven't done it before - the LIMIT makes it past the InlineJoin into the lefthand side already, since InlineJoin preserves the rows count).
But a TODO can re-eval things later, 👍 .

Oh, we can further optimize it alright :)

The limit being pushed/copied down into the left hand side is a bug - that should only happen in subsequent passes. See my comment here.

Also, if the INLINESTATS has no BY clause, it can be turned into an EVAL with literals in subsequent phases, which can trigger more optimizations (like constant folding, filter pushdown, optimizing away checks against constant keyword fields). Some of this could probably be somehow hacked into the first optimization pass, but currently I think it's more natural (or at least easier) to just have another optimizer pass per query phase.

The limit being pushed/copied down into the left hand side is a bug - that should only happen in subsequent passes.

Actually, very true!

Also, if the INLINESTATS has no BY clause, it can be turned into an EVAL with literals in subsequent phases

I see. Yes, this could be done, but not with the current shape of the planning (i.e. rerunning the optimiser as it is now won't replan). But yes, you're right, this could still be evolved.

Aggregation. Move two methods from EsqlSession to InlineJoin. Address reviews Add more comments

…support_multi_inlinestats_logicalPlan_approach

alex-spies

Heya, all done now.

This is a nice advancement of INLINE STATS, let's go! I think there are a couple things to be looked at in follow ups, but this is fine to merge as-is IMO; we can continue iterating in follow-ups as the changes here do not seem to affect the code health of the non-INLINE STATS code paths.

Most notable follow-up items IMO:

Correctness of optimization rules w.r.t. stubs, esp. the limit pushdown.
More logical plan optimizer tests for PruneColumns and a good hard look at PruneColumns (maybe I'm misreading things though, see comments below)

alex-spies · 2025-06-30T12:59:55Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/InlineJoin.java

    /**
     * Replaces the stubbed source with the actual source.
     */
    public static LogicalPlan replaceStub(LogicalPlan source, LogicalPlan stubbed) {


nit: can we add a note that this will replace all stubs with the new source? In case of a plan with 2 stubs (dual inlinestats in the query), this method should normally be avoided to avoid chaos.

alex-spies · 2025-06-30T13:50:10Z