Allow renaming group-by fields to existing field names#4586
Allow renaming group-by fields to existing field names#4586qianheng-aws merged 6 commits intoopensearch-project:mainfrom
Conversation
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
| private boolean isInputRef(RexNode node) { | ||
| return switch (node.getKind()) { | ||
| case AS, DESCENDING, NULLS_FIRST, NULLS_LAST -> { | ||
| final List<RexNode> operands = ((RexCall) node).operands; | ||
| yield isInputRef(operands.getFirst()); | ||
| } | ||
| default -> node instanceof RexInputRef; | ||
| }; | ||
| } |
There was a problem hiding this comment.
can the PlanUtil.getInputRefs be used to replace this?
There was a problem hiding this comment.
I think they serve different purposes. PlanUtil.getInputRefs returns all referred input refs. Besides, if a node refers multiple inputs, it will return all of them. Yet here I just want to check whether a node is an input ref (optionally aliased), keeping the node as is.
| // During aggregation, Calcite projects both input dependencies and output group-by fields. | ||
| // When names conflict, Calcite adds numeric suffixes (e.g., "value0"). | ||
| // Apply explicit renaming to restore the intended aliases. | ||
| if (names.size() == reResolved.getLeft().size()) { |
There was a problem hiding this comment.
when the names.size not equals to reResolved.getLeft().size()? seems the condition is always true
There was a problem hiding this comment.
The lengths do not equal when a group key is not aliased -- under which circumstance extractAliasLiteral will return empty:
private Optional<RexLiteral> extractAliasLiteral(RexNode node) {
if (node == null) {
return Optional.empty();
} else if (node.getKind() == AS) {
return Optional.of((RexLiteral) ((RexCall) node).getOperands().get(1));
} else {
return Optional.empty();
}Although it seems that all group keys are aliased in practice, this defense check was to prevent unintended future changes to avoid in-correspondent renaming. Should I remove it?
| Pair<List<RexNode>, List<AggCall>> reResolved = | ||
| resolveAttributesForAggregation(groupExprList, aggExprList, context); | ||
|
|
||
| List<String> names = getGroupKeyNamesAfterAggregation(reResolved.getLeft()); |
There was a problem hiding this comment.
can you rename the var names to make it more meaningful
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
| * Imitates {@code Registrar.registerExpression} of {@link RelBuilder} to derive the output order | ||
| * of group-by keys after aggregation. | ||
| * | ||
| * <p>The projected input reference comes first, while any other computed expression follows. |
There was a problem hiding this comment.
In Registrar.registerExpression, seems the other computed expression won't promise following the original order if there is expression duplication.
But since our PPL only allow span expr in our group by and it cannot be combined with other span expr. This logic may be right and I cannot find any bad case so far.
There was a problem hiding this comment.
I found a bad case: stats count() by value, value, @timestamp. I'll fix it.
Update: Fixed by checking duplication
| /** Whether a rex node is an aliased input reference */ | ||
| private boolean isInputRef(RexNode node) { | ||
| return switch (node.getKind()) { | ||
| case AS, DESCENDING, NULLS_FIRST, NULLS_LAST -> { |
There was a problem hiding this comment.
Is there any case that we have DESCENDING, NULLS_FIRST, NULLS_LAST in our stats .. by ... command
There was a problem hiding this comment.
No, I didn't manage to create any. It seems there is always a projection after sorting and before aggregation.
E.g.
LogicalAggregate(group=[{0}], count()=[COUNT()])
LogicalProject(value=[$2])
LogicalSort(sort0=[$2], dir0=[DESC-nulls-last])
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
* Rename fields to intended ones after aggregation Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Add a defense check Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Remove defense check Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Handle cases where there exist duplicated group keys Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> --------- Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> (cherry picked from commit a86a5a7) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…names (#4653) * Allow renaming group-by fields to existing field names (#4586) * Rename fields to intended ones after aggregation Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Add a defense check Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Remove defense check Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Handle cases where there exist duplicated group keys Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> --------- Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> (cherry picked from commit a86a5a7) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Downgrade language level to java 11 Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> --------- Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Yuanchun Shen <yuanchu@amazon.com>
* default-main: (34 commits) Enhance dynamic source clause to support only metadata filters (opensearch-project#4554) Make nested alias type support referring to outer context (opensearch-project#4673) Update big5 ppl queries and check plans (opensearch-project#4668) Support push down sort after limit (opensearch-project#4657) Use table scan rowType in filter pushdown could fix rename issue (opensearch-project#4670) Fix: Support Alias Fields in MIN, MAX, FIRST, LAST, and TAKE Aggregations (opensearch-project#4621) Fix bin nested fields issue (opensearch-project#4606) Add `per_minute`, `per_hour`, `per_day` function support (opensearch-project#4531) Pushdown sort aggregate metrics (opensearch-project#4603) Followup: Change ComparableLinkedHashMap to compare Key than Value (opensearch-project#4648) Mitigate the CI failure caused by 500 Internal Server Error (opensearch-project#4646) Allow renaming group-by fields to existing field names (opensearch-project#4586) Publish internal modules separately for downstream reuse (opensearch-project#4484) Revert "Update grammar files and developer guide (opensearch-project#4301)" (opensearch-project#4643) Support Automatic Type Conversion for REX/SPATH/PARSE Command Extractions (opensearch-project#4599) Replace all dots in fields of table scan's PhysType (opensearch-project#4633) Return comparable LinkedHashMap in `valueForCalcite()` of ExprTupleValue (opensearch-project#4629) Refactor JsonExtractAllFunctionIT and MapConcatFunctionIT (opensearch-project#4623) Pushdown case function in aggregations as range queries (opensearch-project#4400) Update GEOIP function to support IP types as input (opensearch-project#4613) ... # Conflicts: # docs/user/ppl/functions/conversion.rst
* default-main: (34 commits) Enhance dynamic source clause to support only metadata filters (opensearch-project#4554) Make nested alias type support referring to outer context (opensearch-project#4673) Update big5 ppl queries and check plans (opensearch-project#4668) Support push down sort after limit (opensearch-project#4657) Use table scan rowType in filter pushdown could fix rename issue (opensearch-project#4670) Fix: Support Alias Fields in MIN, MAX, FIRST, LAST, and TAKE Aggregations (opensearch-project#4621) Fix bin nested fields issue (opensearch-project#4606) Add `per_minute`, `per_hour`, `per_day` function support (opensearch-project#4531) Pushdown sort aggregate metrics (opensearch-project#4603) Followup: Change ComparableLinkedHashMap to compare Key than Value (opensearch-project#4648) Mitigate the CI failure caused by 500 Internal Server Error (opensearch-project#4646) Allow renaming group-by fields to existing field names (opensearch-project#4586) Publish internal modules separately for downstream reuse (opensearch-project#4484) Revert "Update grammar files and developer guide (opensearch-project#4301)" (opensearch-project#4643) Support Automatic Type Conversion for REX/SPATH/PARSE Command Extractions (opensearch-project#4599) Replace all dots in fields of table scan's PhysType (opensearch-project#4633) Return comparable LinkedHashMap in `valueForCalcite()` of ExprTupleValue (opensearch-project#4629) Refactor JsonExtractAllFunctionIT and MapConcatFunctionIT (opensearch-project#4623) Pushdown case function in aggregations as range queries (opensearch-project#4400) Update GEOIP function to support IP types as input (opensearch-project#4613) ... Signed-off-by: Asif Bashar <asif.bashar@gmail.com>
…oject#4586) * Rename fields to intended ones after aggregation Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Add a defense check Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Remove defense check Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Handle cases where there exist duplicated group keys Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> --------- Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Description
This PR fixes a bug in Calcite-enabled PPL queries where group-by fields cannot be aliased to their original field names, causing queries to fail with "field not found" errors.
When Calcite is enabled, PPL queries that use span functions with aliases matching the original field names fail with errors like:
field [value] not found; input fields are: [value0, count()]Affected Query Patterns:
source=time_test | stats count() by span(value, 2000) as valuesource=time_test | stats count() by span(timestamp, 1h) as timestampRoot Cause Analysis
The issue occurs during Calcite's aggregation processing:
Solution Implementation
This PR implements a post-aggregation field renaming strategy that preserves intended aliases.
Related Issues
Resolves #4580
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.