suggestions to code review by pgomulka · Pull Request #4 · rjernst/elasticsearch

pgomulka · 2023-06-16T14:02:25Z

Have you signed the contributor license agreement?
Have you followed the contributor guidelines?
If submitting code, have you built your formula locally prior to submission with gradle check?
If submitting code, is your pull request against master? Unless there is a good reason otherwise, we prefer pull requests against master and will backport as needed.
If submitting code, have you checked that your submission is for an OS and architecture that we support?
If you are submitting this code for a class then read our policy for that.

Fixes elastic/elasticsearch-internal#497 Fixes ESQL-560 A query like `from test | sort data | limit 2 | project count` fails because `LocalToGlobalLimitAndTopNExec` planning rule adds a collecting `TopNExec` after last GATHER exchange, to perform last reduce, see ``` TopNExec[[Order[data{f}#6,ASC,LAST]],2[INTEGER]] \_ExchangeExec[GATHER,SINGLE_DISTRIBUTION] \_ProjectExec[[count{f}#4]] // <- `data` is projected away but still used by the TopN node above \_FieldExtractExec[count{f}#4] \_TopNExec[[Order[data{f}#6,ASC,LAST]],2[INTEGER]] \_FieldExtractExec[data{f}#6] \_ExchangeExec[REPARTITION,FIXED_ARBITRARY_DISTRIBUTION] \_EsQueryExec[test], query[][_doc_id{f}elastic#9, _segment_id{f}elastic#10, _shard_id{f}elastic#11] ``` Unfortunately, at that stage the inputs needed by the TopNExec could have been projected away by a ProjectExec, so they could be no longer available. This PR adapts the plan as follows: - add all the projections used by the `TopNExec` to the existing `ProjectExec`, so that they are available when needed - add another ProjectExec on top of the plan, to project away the originally removed projections and preserve the query semantics This approach is a bit dangerous, because it bypasses the mechanism of input/output resolution and validation that happens on the logical plan. The alternative would be to do this manipulation on the logical plan, but it's probably hard to do, because there is no concept of Exchange at that level.

…tic#140027) This PR fixes the issue where `INLINE STATS GROUP BY null` was being incorrectly pruned by `PruneLeftJoinOnNullMatchingField`. Fixes elastic#139887 ## Problem For query: ``` FROM employees | INLINE STATS c = COUNT(*) BY n = null | KEEP c, n | LIMIT 3 ``` During `LogicalPlanOptimizer`: ``` Limit[3[INTEGER],false,false] \_EsqlProject[[c{r}#2, n{r}#4]] \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}elastic#7] \_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]] \_StubRelation[[<no-fields>{r$}elastic#7, n{r}#4]] ``` The following join node: ``` InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}elastic#7] \_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]] \_StubRelation[[<no-fields>{r$}elastic#7, n{r}#4]] ``` should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the right side is an `Aggregate` (originating from `INLINE STATS`). Since `STATS` supports `GROUP BY null`, the join key being null is a valid use case. Pruning this join would incorrectly eliminate the aggregation results, changing the query semantics. During `LocalLogicalPlanOptimizer`: ``` ProjectExec[[c{r}#2, n{r}#4]] \_LimitExec[3[INTEGER],null] \_ExchangeExec[[c{r}#2, n{r}#4],false] \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<> Project[[c{r}#2, n{r}#4]] \_Limit[3[INTEGER],false,false] \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}elastic#7] \_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]<>]] ``` The following join node: ``` InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}elastic#7] \_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}] ``` should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the right side is a `LocalRelation` (the `Aggregate` was optimized into a `LocalRelation` containing the pre-computed aggregation results). Pruning this join when the join key is null would discard the valid aggregation results stored in the `LocalRelation`, incorrectly producing null values instead of the expected count. ## Solution The fix ensures that `PruneLeftJoinOnNullMatchingField` only applies to `LOOKUP JOIN` nodes, where `join.right()` is an `EsRelation`. For `INLINE STATS` joins, the right side can be: - `Aggregate` (before optimization), or - `LocalRelation` (after the aggregate is optimized) By checking `join.right() instanceof EsRelation`, we correctly skip the pruning optimization for `INLINE STATS` joins, preserving the expected query results when grouping by null.

…tic#140027) (elastic#141095) This PR fixes the issue where `INLINE STATS GROUP BY null` was being incorrectly pruned by `PruneLeftJoinOnNullMatchingField`. Fixes elastic#139887 ## Problem For query: ``` FROM employees | INLINE STATS c = COUNT(*) BY n = null | KEEP c, n | LIMIT 3 ``` During `LogicalPlanOptimizer`: ``` Limit[3[INTEGER],false,false] \_EsqlProject[[c{r}#2, n{r}#4]] \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}elastic#7] \_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]] \_StubRelation[[<no-fields>{r$}elastic#7, n{r}#4]] ``` The following join node: ``` InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}elastic#7] \_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]] \_StubRelation[[<no-fields>{r$}elastic#7, n{r}#4]] ``` should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the right side is an `Aggregate` (originating from `INLINE STATS`). Since `STATS` supports `GROUP BY null`, the join key being null is a valid use case. Pruning this join would incorrectly eliminate the aggregation results, changing the query semantics. During `LocalLogicalPlanOptimizer`: ``` ProjectExec[[c{r}#2, n{r}#4]] \_LimitExec[3[INTEGER],null] \_ExchangeExec[[c{r}#2, n{r}#4],false] \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<> Project[[c{r}#2, n{r}#4]] \_Limit[3[INTEGER],false,false] \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}elastic#7] \_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]<>]] ``` The following join node: ``` InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}elastic#7] \_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}] ``` should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the right side is a `LocalRelation` (the `Aggregate` was optimized into a `LocalRelation` containing the pre-computed aggregation results). Pruning this join when the join key is null would discard the valid aggregation results stored in the `LocalRelation`, incorrectly producing null values instead of the expected count. ## Solution The fix ensures that `PruneLeftJoinOnNullMatchingField` only applies to `LOOKUP JOIN` nodes, where `join.right()` is an `EsRelation`. For `INLINE STATS` joins, the right side can be: - `Aggregate` (before optimization), or - `LocalRelation` (after the aggregate is optimized) By checking `join.right() instanceof EsRelation`, we correctly skip the pruning optimization for `INLINE STATS` joins, preserving the expected query results when grouping by null. (cherry picked from commit f3ccb70) Co-authored-by: kanoshiou <uiaao@tuta.io>

suggestions to code review

aa8a6ac

pgomulka mentioned this pull request Jun 16, 2023

Implement custom JUL bridge elastic/elasticsearch#96872

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

suggestions to code review#4

suggestions to code review#4
pgomulka wants to merge 1 commit intorjernst:logging/jul2from
pgomulka:add_parameters

pgomulka commented Jun 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pgomulka commented Jun 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant