Support Streamstats command with calcite#4297
Support Streamstats command with calcite#4297qianheng-aws merged 34 commits intoopensearch-project:mainfrom
Streamstats command with calcite#4297Conversation
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
Outdated
Show resolved
Hide resolved
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
|
|
||
| // aggregate all window functions on right side | ||
| List<AggCall> aggCalls = buildAggCallsForWindowFunctions(node.getWindowFunctionList(), context); | ||
| context.relBuilder.aggregate(context.relBuilder.groupKey(), aggCalls); |
There was a problem hiding this comment.
note (for reviewers): the group key is empty because the partition is done with buildGroupFilter(context, groupList, v.get())
I think the second case is the same as the default solution window=n, global=false+by, but it cannot achieve the case of window=n, global=true+by, which is why I used the former expression. And for Reset path, I have done a lot of experiments on SPL, and I think our current implementation is consistent with the behavior of it. @qianheng-aws |
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
LantaoJin
left a comment
There was a problem hiding this comment.
Basically looks good, please fix the IT.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
see what is wrong
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
| context, | ||
| left, |
There was a problem hiding this comment.
context.relBuilder.projectExcept was out of the hasGroup branch, now you move it into the branch. Is it on purpose?
There was a problem hiding this comment.
Yes, this change is just a small optimization. It only creates the row_num column and sorts and deletes it when hasGroup is true in the default path. Previously, row_num would be created in all cases but only sorted when hasGroup is true. I think this meets the expectations and does not introduce any logical changes.
integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_distinct_count.yaml
Show resolved
Hide resolved
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4297-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 5077062932cfefe785f72716d2f2f7aa65177817
# Push it to GitHub
git push --set-upstream origin backport/backport-4297-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-devThen, create a pull request where the |
* support streamstats simply Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * add some tests Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * add UT Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * fix some error Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * add global Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * implement global Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * implement reset Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * implement all the arguments Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * fix test Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * add all IT, UT and rst doc Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * fix anonymizer test Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * fix doctest Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * modify doc and IT Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * add explainIT Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * fix import Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * fix typo Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * fix doctest Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * fix explainIT yaml format Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * fix dc nopushdown explainIT Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * add explainIT for path2 and path3 Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * typo error Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * handle resort case Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * fix IT Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * change row_num Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * Rule out aggregator from PPLAggregateMergeRule Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * Rule out aggregator from PPLAggregateMergeRule Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * fix explainIT Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> --------- Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> Co-authored-by: Yuanchun Shen <yuanchu@amazon.com>
Description
Support
Streamstatscommand with arguments below:Also rule out aggregator in PPLAggregateMergeRule.
Implementation Details
The implementation handles three distinct execution paths, depending on the combination of
window,global,group, andresetarguments:Why This Design
Default path can rely on native SQL OVER because there is no global/window-with-reset complexity.
Specific SQL limitations:
Native SQL OVER clauses cannot implement per-group sliding windows over the entire stream . However, we want to combine a global sequence with group-level partitioning. In SQL, a window is either global without a BY clause or partitioned by a group with a BY clause; you cannot have a “global sequence plus per-group sliding frame” in one OVER.
ROWS BETWEEN ... PRECEDINGcannot take a variable (it only supports constants like1 PRECEDING,1+1 PRECEDING).Global + window + groupwant "per-group sliding windows over entire stream," but SQL window functions do not allow fully flexible frame boundaries combined with lateral joins. Hence, we simulate it viaROW_NUMBER() + correlated join + aggregate.Reset path introduces segment semantics (
seg_id) that cannot be represented natively in SQL OVER clauses. Each reset creates a new frame partition. By default, reset behaves like a global window, but when grouping exists, it applies per-group aggregation within each reset segment. So I use helper columns (before_flag, after_flag, seg_id) and a correlated join ensures correctness.1. Default Path (No global in use / no reset)
2. global=true + window > 0 + group exists
To support sliding windows over the entire stream with optional grouping:
3. Reset Path (reset_before / reset_after defined)
When
reset_beforeorreset_afterexist:Related Issues
Resolves #4207
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.