Skip to content

suggestions to code review#4

Open
pgomulka wants to merge 1 commit intorjernst:logging/jul2from
pgomulka:add_parameters
Open

suggestions to code review#4
pgomulka wants to merge 1 commit intorjernst:logging/jul2from
pgomulka:add_parameters

Conversation

@pgomulka
Copy link

  • Have you signed the contributor license agreement?
  • Have you followed the contributor guidelines?
  • If submitting code, have you built your formula locally prior to submission with gradle check?
  • If submitting code, is your pull request against master? Unless there is a good reason otherwise, we prefer pull requests against master and will backport as needed.
  • If submitting code, have you checked that your submission is for an OS and architecture that we support?
  • If you are submitting this code for a class then read our policy for that.

rjernst pushed a commit that referenced this pull request Aug 17, 2023
Fixes elastic/elasticsearch-internal#497
Fixes ESQL-560

A query like `from test | sort data | limit 2 | project count` fails
because `LocalToGlobalLimitAndTopNExec` planning rule adds a collecting
`TopNExec` after last GATHER exchange, to perform last reduce, see

```
TopNExec[[Order[data{f}#6,ASC,LAST]],2[INTEGER]]
\_ExchangeExec[GATHER,SINGLE_DISTRIBUTION]
  \_ProjectExec[[count{f}#4]]      // <- `data` is projected away but still used by the TopN node above
    \_FieldExtractExec[count{f}#4]
      \_TopNExec[[Order[data{f}#6,ASC,LAST]],2[INTEGER]]
        \_FieldExtractExec[data{f}#6]
          \_ExchangeExec[REPARTITION,FIXED_ARBITRARY_DISTRIBUTION]
            \_EsQueryExec[test], query[][_doc_id{f}elastic#9, _segment_id{f}elastic#10, _shard_id{f}elastic#11]
```

Unfortunately, at that stage the inputs needed by the TopNExec could
have been projected away by a ProjectExec, so they could be no longer
available.

This PR adapts the plan as follows:
- add all the projections used by the `TopNExec` to the existing
`ProjectExec`, so that they are available when needed
- add another ProjectExec on top of the plan, to project away the
originally removed projections and preserve the query semantics


This approach is a bit dangerous, because it bypasses the mechanism of
input/output resolution and validation that happens on the logical plan.
The alternative would be to do this manipulation on the logical plan,
but it's probably hard to do, because there is no concept of Exchange at
that level.
prdoyle pushed a commit that referenced this pull request Jan 22, 2026
…tic#140027)

This PR fixes the issue where `INLINE STATS GROUP BY null` was being
incorrectly pruned by `PruneLeftJoinOnNullMatchingField`.

Fixes elastic#139887

## Problem For query:

```
FROM employees
| INLINE STATS c = COUNT(*) BY n = null
| KEEP c, n
| LIMIT 3
```

During `LogicalPlanOptimizer`:

```
Limit[3[INTEGER],false,false]
\_EsqlProject[[c{r}#2, n{r}#4]]
  \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]]
    |_Eval[[null[NULL] AS n#4]]
    | \_EsRelation[employees][<no-fields>{r$}elastic#7]
    \_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]]
      \_StubRelation[[<no-fields>{r$}elastic#7, n{r}#4]]
```

The following join node:

```
InlineJoin[LEFT,[n{r}#4],[n{r}#4]]
|_Eval[[null[NULL] AS n#4]]
| \_EsRelation[employees][<no-fields>{r$}elastic#7]
\_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]]
  \_StubRelation[[<no-fields>{r$}elastic#7, n{r}#4]]
```

should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the
right side is an `Aggregate` (originating from `INLINE STATS`). Since
`STATS` supports `GROUP BY null`, the join key being null is a valid use
case. Pruning this join would incorrectly eliminate the aggregation
results, changing the query semantics.

During `LocalLogicalPlanOptimizer`:

```
ProjectExec[[c{r}#2, n{r}#4]]
\_LimitExec[3[INTEGER],null]
  \_ExchangeExec[[c{r}#2, n{r}#4],false]
    \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<>
Project[[c{r}#2, n{r}#4]]
\_Limit[3[INTEGER],false,false]
  \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]]
    |_Eval[[null[NULL] AS n#4]]
    | \_EsRelation[employees][<no-fields>{r$}elastic#7]
    \_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]<>]]
```

The following join node:

```
InlineJoin[LEFT,[n{r}#4],[n{r}#4]]
|_Eval[[null[NULL] AS n#4]]
| \_EsRelation[employees][<no-fields>{r$}elastic#7]
\_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]
```

should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the
right side is a `LocalRelation` (the `Aggregate` was optimized into a
`LocalRelation` containing the pre-computed aggregation results).
Pruning this join when the join key is null would discard the valid
aggregation results stored in the `LocalRelation`, incorrectly producing
null values instead of the expected count.

## Solution The fix ensures that `PruneLeftJoinOnNullMatchingField` only
applies to `LOOKUP JOIN` nodes, where `join.right()` is an `EsRelation`.
For `INLINE STATS` joins, the right side can be:

 - `Aggregate` (before optimization), or
 - `LocalRelation` (after the aggregate is optimized)

By checking `join.right() instanceof EsRelation`, we correctly skip the
pruning optimization for `INLINE STATS` joins, preserving the expected
query results when grouping by null.
rjernst pushed a commit that referenced this pull request Jan 23, 2026
…tic#140027) (elastic#141095)

This PR fixes the issue where `INLINE STATS GROUP BY null` was being
incorrectly pruned by `PruneLeftJoinOnNullMatchingField`.

Fixes elastic#139887

## Problem For query:

```
FROM employees
| INLINE STATS c = COUNT(*) BY n = null
| KEEP c, n
| LIMIT 3
```

During `LogicalPlanOptimizer`:

```
Limit[3[INTEGER],false,false]
\_EsqlProject[[c{r}#2, n{r}#4]]
  \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]]
    |_Eval[[null[NULL] AS n#4]]
    | \_EsRelation[employees][<no-fields>{r$}elastic#7]
    \_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]]
      \_StubRelation[[<no-fields>{r$}elastic#7, n{r}#4]]
```

The following join node:

```
InlineJoin[LEFT,[n{r}#4],[n{r}#4]]
|_Eval[[null[NULL] AS n#4]]
| \_EsRelation[employees][<no-fields>{r$}elastic#7]
\_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]]
  \_StubRelation[[<no-fields>{r$}elastic#7, n{r}#4]]
```

should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the
right side is an `Aggregate` (originating from `INLINE STATS`). Since
`STATS` supports `GROUP BY null`, the join key being null is a valid use
case. Pruning this join would incorrectly eliminate the aggregation
results, changing the query semantics.

During `LocalLogicalPlanOptimizer`:

```
ProjectExec[[c{r}#2, n{r}#4]]
\_LimitExec[3[INTEGER],null]
  \_ExchangeExec[[c{r}#2, n{r}#4],false]
    \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<>
Project[[c{r}#2, n{r}#4]]
\_Limit[3[INTEGER],false,false]
  \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]]
    |_Eval[[null[NULL] AS n#4]]
    | \_EsRelation[employees][<no-fields>{r$}elastic#7]
    \_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]<>]]
```

The following join node:

```
InlineJoin[LEFT,[n{r}#4],[n{r}#4]]
|_Eval[[null[NULL] AS n#4]]
| \_EsRelation[employees][<no-fields>{r$}elastic#7]
\_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]
```

should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the
right side is a `LocalRelation` (the `Aggregate` was optimized into a
`LocalRelation` containing the pre-computed aggregation results).
Pruning this join when the join key is null would discard the valid
aggregation results stored in the `LocalRelation`, incorrectly producing
null values instead of the expected count.

## Solution The fix ensures that `PruneLeftJoinOnNullMatchingField` only
applies to `LOOKUP JOIN` nodes, where `join.right()` is an `EsRelation`.
For `INLINE STATS` joins, the right side can be:

 - `Aggregate` (before optimization), or
 - `LocalRelation` (after the aggregate is optimized)

By checking `join.right() instanceof EsRelation`, we correctly skip the
pruning optimization for `INLINE STATS` joins, preserving the expected
query results when grouping by null.

(cherry picked from commit f3ccb70)

Co-authored-by: kanoshiou <uiaao@tuta.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant