Skip to content

Conversation

@avamingli
Copy link
Contributor

Fix issue #1301

For partition tables, cdb_pull_up_eclass will call get_eclass_for_sort_expr to find an existing equivalence class it is a member of. It will fail and build a new single-member EquivalenceClass for it when there is RelabelType. That will cause cdbpath_match_preds_to_both_distkeys return false even the locus are compatible because the RestrictInfo's left_ec/right_ec and rel's distribution key's dk_eclasses are two different pointers.
And unnecessary Motions will be added in that case.

CREATE TABLE t1 (id varchar(32), date date) DISTRIBUTED BY (id) PARTITION BY RANGE (date)
(START (date '2016-01-01') INCLUSIVE END (date '2016-01-02') EXCLUSIVE EVERY (INTERVAL '1 day')); CREATE TABLE t2 (id varchar(32)) DISTRIBUTED BY (id); EXPLAIN(COSTS OFF) SELECT COUNT(*) FROM t1 JOIN t2 USING(id);
                            QUERY PLAN
------------------------------------------------------------------
 Aggregate
   ->  Gather Motion 3:1  (slice1; segments: 3)
         ->  Hash Join
               Hash Cond: ((t1.id)::text = (t2.id)::text)
               ->  Redistribute Motion 3:3  (slice2; segments: 3)
                     Hash Key: t1.id
                     ->  Seq Scan on t1_1_prt_1 t1
               ->  Hash
                     ->  Seq Scan on t2
 Optimizer: Postgres query optimizer
(10 rows)

cdbpullup_findEclassInTargetList() will ignore the presence or absence of a RelabelType node atop either expr in determining whether an EC member expr matches a targetlist expr. But get_eclass_for_sort_expr() didn't handle additional RelabelType and a new EC may be generated. We should not disturb get_eclass_for_sort_expr() as it's used at many places and UPSTREAM didn't add RelabelType to EC for that case.
If find a sub_distkeyexpr from cdbpullup_findEclassInTargetList and we know that there is RelabelType stripped inside, we don't need to try to fetch or generate new EC in get_eclass_for_sort_expr(). So, add options for cdb_pull_up_eclass to ignore RelabelType when needed.

Authored-by: Zhang Mingli [email protected]

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


Copy link
Contributor

@my-ship-it my-ship-it left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

…Type in cdb_pull_up_eclass.

Fix issue apache#1301

For partition tables, cdb_pull_up_eclass will call get_eclass_for_sort_expr
to find an existing equivalence class it is a member of.
It will fail and build a new single-member EquivalenceClass for it when there is RelabelType.
That will cause cdbpath_match_preds_to_both_distkeys return false even the locus are compatible
because the RestrictInfo's left_ec/right_ec and rel's distribution key's dk_eclasses are two
different pointers.
And unnecessary Motions will be added in that case.

CREATE TABLE t1 (id varchar(32), date date) DISTRIBUTED BY (id)
PARTITION BY RANGE (date)
(START (date '2016-01-01') INCLUSIVE END (date '2016-01-02') EXCLUSIVE EVERY (INTERVAL '1 day'));
CREATE TABLE t2 (id varchar(32)) DISTRIBUTED BY (id);
EXPLAIN(COSTS OFF) SELECT COUNT(*) FROM t1 JOIN t2 USING(id);
                            QUERY PLAN
------------------------------------------------------------------
 Aggregate
   ->  Gather Motion 3:1  (slice1; segments: 3)
         ->  Hash Join
               Hash Cond: ((t1.id)::text = (t2.id)::text)
               ->  Redistribute Motion 3:3  (slice2; segments: 3)
                     Hash Key: t1.id
                     ->  Seq Scan on t1_1_prt_1 t1
               ->  Hash
                     ->  Seq Scan on t2
 Optimizer: Postgres query optimizer
(10 rows)

cdbpullup_findEclassInTargetList() will ignore the presence or absence of a RelabelType node atop
either expr in determining whether an EC member expr matches a targetlist expr.
But get_eclass_for_sort_expr() didn't handle additional RelabelType and a new EC may be generated.
We should not disturb get_eclass_for_sort_expr() as it's used at many places and UPSTREAM didn't
add RelabelType to EC for that case.
If find a sub_distkeyexpr from cdbpullup_findEclassInTargetList and we know that there is RelabelType
stripped inside, we don't need to try to fetch or generate new EC in get_eclass_for_sort_expr().
So, add options for cdb_pull_up_eclass to ignore RelabelType when needed.

Authored-by: Zhang Mingli [email protected]
@avamingli avamingli merged commit 91c28e9 into apache:main Aug 13, 2025
27 checks passed
@avamingli avamingli deleted the fix_join_motion branch August 13, 2025 06:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants