-
Notifications
You must be signed in to change notification settings - Fork 196
Fix row estimation for parallel subquery paths. #1284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
445eb65 to
e572023
Compare
414c999 to
0791ff6
Compare
|
Previously, we attempted to disable window functions inside CASE WHEN This issue was uncovered when we fixed the subquery row count Fixed at commit: Correct parallel window function in CASE WHEN. EXPLAIN(COSTS OFF)
SELECT empno, depname, salary, bonus, depadj, MIN(bonus) OVER (ORDER BY
empno), MAX(depadj) OVER () FROM(
SELECT *,
CASE WHEN enroll_date < '2008-01-01' THEN 2008 -
extract(YEAR FROM enroll_date) END * 500 AS bonus,
CASE WHEN
AVG(salary) OVER (PARTITION BY depname) < salary
THEN 200 END AS depadj FROM empsalary
)s;
QUERY PLAN
--------------------------------------------------------------------------
WindowAgg
-> WindowAgg
Order By: s.empno
-> Gather Motion 6:1 (slice1; segments: 6)
Merge Key: s.empno
-> Sort
Sort Key: s.empno
-> Subquery Scan on s
-> WindowAgg
Partition By: empsalary.depname
-> Sort
Sort Key: empsalary.depname
-> Redistribute Motion 6:6
(slice2; segments: 6)
Hash Key: empsalary.depname
Hash Module: 3
-> Parallel Seq Scan on
empsalary
Optimizer: Postgres query optimizer
(17 rows) |
519c515 to
06ee760
Compare
06ee760 to
5a938a9
Compare
In CBDB, path's row estimation is determined by subpath's rows and cluster segments. However, when there is a parallel subquery scan path, each worker will process fewer rows (divided by parallel_workers). This commit fixes that issue. The correction not only makes parallel subquery estimation more accurate, but also enables the entire plan to be as parallel as possible, particularly for subqueries in complex queries. Authored-by: Zhang Mingli [email protected]
Previously, we attempted to disable window functions inside CASE WHEN
expressions due to concerns about unstable parallel results. However,
this was a misunderstanding. All expressions from the subquery are Var
columns, not the original expressions.
This issue was uncovered when we fixed the subquery row count
estimation, causing the cost to change in the upper plan.
EXPLAIN(COSTS OFF)
SELECT empno, depname, salary, bonus, depadj, MIN(bonus) OVER (ORDER BY
empno), MAX(depadj) OVER () FROM(
SELECT *,
CASE WHEN enroll_date < '2008-01-01' THEN 2008 -
extract(YEAR FROM enroll_date) END * 500 AS bonus,
CASE WHEN
AVG(salary) OVER (PARTITION BY depname) < salary
THEN 200 END AS depadj FROM empsalary
)s;
QUERY PLAN
--------------------------------------------------------------------------
WindowAgg
-> WindowAgg
Order By: s.empno
-> Gather Motion 6:1 (slice1; segments: 6)
Merge Key: s.empno
-> Sort
Sort Key: s.empno
-> Subquery Scan on s
-> WindowAgg
Partition By: empsalary.depname
-> Sort
Sort Key: empsalary.depname
-> Redistribute Motion 6:6
(slice2; segments: 6)
Hash Key: empsalary.depname
Hash Module: 3
-> Parallel Seq Scan on
empsalary
Optimizer: Postgres query optimizer
(17 rows)
Authored-by: Zhang Mingli [email protected]
5a938a9 to
75b1744
Compare
my-ship-it
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In CBDB, row estimation is determined by the relation's rows and cluster segments.
However, when there is a parallel subquery scan path, each worker will process fewer rows (divided by parallel_workers).
Subquery Scan on "Expr_SUBQUERY" (cost=130.09..137.59 rows=333 width=36)
While, a parallel Subquery Scan has the same rows though cost is less than that.
This commit fixes that issue.
The correction not only makes parallel subquery estimation more accurate, but also enables the entire plan to be as parallel as possible, particularly for subqueries in complex queries.
Authored-by: Zhang Mingli [email protected]
Fixes #ISSUE_Number
What does this PR do?
Type of Change
Breaking Changes
Test Plan
make installcheckmake -C src/test installcheck-cbdb-parallelImpact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context
CI Skip Instructions