Skip to content

Conversation

@jiaqizho
Copy link
Contributor

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


@jiaqizho jiaqizho changed the title ORCA: create windows hash aggregation physical operator when vectorization enabled [DNM]ORCA: create windows hash aggregation physical operator when vectorization enabled Jul 25, 2025
@my-ship-it my-ship-it self-requested a review July 25, 2025 08:19
@jiaqizho jiaqizho force-pushed the orca-support-hash-windowagg branch 3 times, most recently from 4714e80 to 3c9d812 Compare July 29, 2025 03:08
@zhangyue-hashdata
Copy link
Contributor

Is it better to add Assert(!node->isWindowHashAgg); in ExecInitWindowAgg()?

@zhangyue-hashdata
Copy link
Contributor

Is it better to add code below in explain.c?

WindowAgg *wagg = castNode(WindowAgg, plan);
pname = sname = wagg->isWindowHashAgg ? "WindowHashAgg" : "WindowAgg";

1 similar comment
@zhangyue-hashdata
Copy link
Contributor

Is it better to add code below in explain.c?

WindowAgg *wagg = castNode(WindowAgg, plan);
pname = sname = wagg->isWindowHashAgg ? "WindowHashAgg" : "WindowAgg";

@jiaqizho jiaqizho force-pushed the orca-support-hash-windowagg branch from 3c9d812 to 8374618 Compare July 30, 2025 07:28
@jiaqizho
Copy link
Contributor Author

jiaqizho commented Aug 6, 2025

Is it better to add code below in explain.c?

WindowAgg *wagg = castNode(WindowAgg, plan);
pname = sname = wagg->isWindowHashAgg ? "WindowHashAgg" : "WindowAgg";

no need, cause the the row executor won't get the WindowHashAgg node. Only need add the logical in vectorization/***/explain.c

@jiaqizho jiaqizho force-pushed the orca-support-hash-windowagg branch from 8374618 to 4856e06 Compare August 6, 2025 06:55
@jiaqizho jiaqizho changed the title [DNM]ORCA: create windows hash aggregation physical operator when vectorization enabled ORCA: create windows hash aggregation physical operator when vectorization enabled Aug 6, 2025
@jiaqizho jiaqizho force-pushed the orca-support-hash-windowagg branch from 4856e06 to 3ad1cd4 Compare August 12, 2025 08:32
Copy link
Contributor

@my-ship-it my-ship-it left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general except one comment

@jiaqizho jiaqizho force-pushed the orca-support-hash-windowagg branch from 3ad1cd4 to 2ac1f69 Compare August 14, 2025 07:27
@jiaqizho jiaqizho force-pushed the orca-support-hash-windowagg branch from 2ac1f69 to 1b1675c Compare August 15, 2025 01:47
…xector

In this PR, ORCA now supports generating `WindowHashAgg` plans which already
have implementation in the vectorization executor. However, the CBDB row executor
currently lacks implementation for the WindowHashAgg operator. To prevent ORCA
from generating this operator in the row executor, I've added an struct which named
`OptimizerOptions` to control the plan for row executor or vectorization executor.
(By the way, ORCA may later generate plans specifically for the vectorization executor).

The `WindowAgg` operator implemention in the vectorization execution is:

1. First, sorting the input rows by `ORDER BY` keys
2. Then do the `PARTITION` by `PARTITION BY` keys
3. Finally do the window function.

Since step1 must be globally sorted, it cannot be parallelized in the vectorization executor.
This results in poor performance of the `WindowAgg` operator.

By contrast, `WindowHashAgg` employs a more efficient approach:

1. First hashes input data into buckets based on `PARTITION BY` keys
2. Then sorts data `within each bucket` according to `ORDER BY` keys
3. Finally computes window functions on the sorted bucket data

For the row engine, `WindowHashAgg` operators will not be generated.

Also current commit introduces a new GUC named `optimizer_force_window_hash_agg`
to force generate plans with `WindowHashAgg` (Don't used this GUC expect debug ORCA).

Co-Author-By: zhangyue <[email protected]>
@jiaqizho jiaqizho force-pushed the orca-support-hash-windowagg branch from 1b1675c to e9a0f5f Compare August 15, 2025 03:52
@jiaqizho jiaqizho merged commit a840049 into apache:main Aug 15, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants