Skip to content

Conversation

@hawkingrei
Copy link
Member

@hawkingrei hawkingrei commented Dec 10, 2025

What problem does this PR solve?

Issue Number: close #64329

Problem Summary:

What changed and how does it work?

This PR introduces two optimization patterns for transforming LEFT JOIN queries into NOT EXISTS subqueries when checking for missing matches.

Scenario 1: IS NULL on the Join Condition Column

  • In this scenario, the IS NULL filter is applied directly to the join key from the inner table.
Table Schema
CREATE TABLE Table_A (
    id INT,
    name VARCHAR(50)
);

CREATE TABLE Table_B (
    id INT, -- Join Key
    info VARCHAR(50)
);
Plan
-- The optimizer rewrites the LEFT JOIN ... WHERE ... IS NULL pattern into an efficient
-- Anti Semi Join to retrieve rows from Table A that have no match in Table B.
SELECT 
    Table_A.id, 
    Table_A.name
FROM 
    Table_A
LEFT JOIN 
    Table_B ON Table_A.id = Table_B.id
WHERE 
    Table_B.id IS NULL;
+--------------------------+-----------+---------------+--------------------------------------------------------------------------------------+
| id                       | task      | access object | operator info                                                                        |
+--------------------------+-----------+---------------+--------------------------------------------------------------------------------------+
| Projection               | root      |               | test.table_a.id, test.table_a.name                                                   |
| └─Selection              | root      |               | isnull(test.table_b.id)                                                              |
|   └─HashJoin             | root      |               | left outer join, left side:TableReader, equal:[eq(test.table_a.id, test.table_b.id)] |
|     ├─TableReader(Build) | root      |               | data:Selection                                                                       |
|     │ └─Selection        | cop[tikv] |               | not(isnull(test.table_b.id))                                                         |
|     │   └─TableFullScan  | cop[tikv] | table:B       | keep order:false, stats:pseudo                                                       |
|     └─TableReader(Probe) | root      |               | data:TableFullScan                                                                   |
|       └─TableFullScan    | cop[tikv] | table:A       | keep order:false, stats:pseudo                                                       |
+--------------------------+-----------+---------------+--------------------------------------------------------------------------------------+

=>

+----------------------+-----------+---------------+-------------------------------------------------------------------------------------+
| id                   | task      | access object | operator info                                                                       |
+----------------------+-----------+---------------+-------------------------------------------------------------------------------------+
| HashJoin             | root      |               | anti semi join, left side:TableReader, equal:[eq(test.table_a.id, test.table_b.id)] |
| ├─TableReader(Build) | root      |               | data:Selection                                                                      |
| │ └─Selection        | cop[tikv] |               | not(isnull(test.table_b.id))                                                        |
| │   └─TableFullScan  | cop[tikv] | table:B       | keep order:false, stats:pseudo                                                      |
| └─TableReader(Probe) | root      |               | data:TableFullScan                                                                  |
|   └─TableFullScan    | cop[tikv] | table:A       | keep order:false, stats:pseudo                                                      |
+----------------------+-----------+---------------+-------------------------------------------------------------------------------------+

Scenario 2: IS NULL on a Non-Join NOT NULL Column

If a column in the inner table is defined as NOT NULL in the schema, but is filtered as IS NULL after the join, it implies that the join failed to find a match. This allows us to convert the join into ANTI SEMI JOIN even if the column is not part of the join keys. (just guarantee that the condition only filter those null rows only generated from null supplied side)

Table Schema:
CREATE TABLE Table_A (
    id INT,
    name VARCHAR(50)
);

CREATE TABLE Table_B (
    id INT,
    status VARCHAR(20) NOT NULL -- Non-join column with NOT NULL constraint
);
Plan
-- Even though 'status' is not a join key, its NOT NULL constraint 
-- ensures that 'B.status IS NULL' only occurs when no match is found.
SELECT A.* FROM Table_A A 
LEFT JOIN Table_B B ON A.id = B.id 
WHERE B.status IS NULL;
+--------------------------+-----------+---------------+--------------------------------------------------------------------------------------+
| id                       | task      | access object | operator info                                                                        |
+--------------------------+-----------+---------------+--------------------------------------------------------------------------------------+
| Projection               | root      |               | test.table_a.id, test.table_a.name                                                   |
| └─Selection              | root      |               | isnull(test.table_b.status)                                                          |
|   └─HashJoin             | root      |               | left outer join, left side:TableReader, equal:[eq(test.table_a.id, test.table_b.id)] |
|     ├─TableReader(Build) | root      |               | data:Selection                                                                       |
|     │ └─Selection        | cop[tikv] |               | not(isnull(test.table_b.id))                                                         |
|     │   └─TableFullScan  | cop[tikv] | table:B       | keep order:false, stats:pseudo                                                       |
|     └─TableReader(Probe) | root      |               | data:TableFullScan                                                                   |
|       └─TableFullScan    | cop[tikv] | table:A       | keep order:false, stats:pseudo                                                       |
+--------------------------+-----------+---------------+--------------------------------------------------------------------------------------+

=>

+----------------------+-----------+---------------+-------------------------------------------------------------------------------------+
| id                   | task      | access object | operator info                                                                       |
+----------------------+-----------+---------------+-------------------------------------------------------------------------------------+
| HashJoin             | root      |               | anti semi join, left side:TableReader, equal:[eq(test.table_a.id, test.table_b.id)] |
| ├─TableReader(Build) | root      |               | data:Selection                                                                      |
| │ └─Selection        | cop[tikv] |               | not(isnull(test.table_b.id))                                                        |
| │   └─TableFullScan  | cop[tikv] | table:B       | keep order:false, stats:pseudo                                                      |
| └─TableReader(Probe) | root      |               | data:TableFullScan                                                                  |
|   └─TableFullScan    | cop[tikv] | table:A       | keep order:false, stats:pseudo                                                      |
+----------------------+-----------+---------------+-------------------------------------------------------------------------------------+

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)

https://github.com/PingCAP-QE/planreplayertest/pull/31

  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed do-not-merge/needs-tests-checked labels Dec 10, 2025
@hawkingrei hawkingrei changed the title planner: support left outer join into anti semi join [WIP] planner: support left outer join into anti semi join Dec 10, 2025
@ti-chi-bot ti-chi-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 10, 2025
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 10, 2025
@hawkingrei
Copy link
Member Author

/retest

@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 10, 2025
@codecov
Copy link

codecov bot commented Dec 10, 2025

Codecov Report

❌ Patch coverage is 84.50704% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.9126%. Comparing base (4f1b985) to head (1be2f85).
⚠️ Report is 7 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #64959        +/-   ##
================================================
+ Coverage   77.8439%   78.9126%   +1.0686%     
================================================
  Files          1983       1912        -71     
  Lines        542787     527358     -15429     
================================================
- Hits         422527     416152      -6375     
+ Misses       118601     110316      -8285     
+ Partials       1659        890       -769     
Flag Coverage Δ
integration 44.2898% <67.2985%> (-3.8982%) ⬇️
tiprow_ft ?
unit 76.8161% <79.3427%> (+0.3365%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 56.7974% <ø> (ø)
parser ∅ <ø> (∅)
br 48.7744% <ø> (-12.2722%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@hawkingrei hawkingrei changed the title [WIP] planner: support left outer join into anti semi join [WIP] planner: support left outer join into anti semi join | tidb-test=pr/2649 Dec 11, 2025
@hawkingrei
Copy link
Member Author

/retest

@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 11, 2025
@hawkingrei
Copy link
Member Author

/retest

@hawkingrei hawkingrei force-pushed the 64329 branch 2 times, most recently from ae46b9c to fc4d779 Compare December 12, 2025 11:34
@hawkingrei
Copy link
Member Author

/retest

2 similar comments
@hawkingrei
Copy link
Member Author

/retest

@hawkingrei
Copy link
Member Author

/retest

@hawkingrei
Copy link
Member Author

/retest

2 similar comments
@hawkingrei
Copy link
Member Author

/retest

@hawkingrei
Copy link
Member Author

/retest

Copilot AI review requested due to automatic review settings January 19, 2026 17:38
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings January 19, 2026 18:13
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 1 comment.

@hawkingrei
Copy link
Member Author

/retest

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Copilot AI review requested due to automatic review settings January 20, 2026 06:41
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Jan 20, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AilinKid, guo-shaoge, winoros

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hawkingrei
Copy link
Member Author

/retest

@hawkingrei
Copy link
Member Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 20, 2026
@hawkingrei
Copy link
Member Author

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 20, 2026
@hawkingrei
Copy link
Member Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 20, 2026
@hawkingrei
Copy link
Member Author

/retest

5 similar comments
@hawkingrei
Copy link
Member Author

/retest

@hawkingrei
Copy link
Member Author

/retest

@YangKeao
Copy link
Member

/retest

@YangKeao
Copy link
Member

/retest

@YangKeao
Copy link
Member

/retest

@ti-chi-bot ti-chi-bot bot merged commit 36a4308 into pingcap:master Jan 20, 2026
36 of 37 checks passed
@hawkingrei hawkingrei deleted the 64329 branch January 23, 2026 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved component/statistics lgtm release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"xxxx IS NULL" prevent the optimizer from choosing the best join order and IndexJoin

6 participants