-
Notifications
You must be signed in to change notification settings - Fork 288
feat: implement null-safe equal operator (<=>) #23599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
gouhongshen
wants to merge
10
commits into
matrixorigin:main
Choose a base branch
from
gouhongshen:feat/null_safe_equal
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
7502f1e
feat: implement null-safe equal operator (<=>)
gouhongshen 6d7b5cc
style: fix gofmt formatting issues
gouhongshen 1f492e8
test: add unit tests for NULL_SAFE_EQUAL operator
gouhongshen 1729d12
test: expand TestNullSafeEqualFn to cover more types
gouhongshen f6a6c4b
test: expand TestNullSafeEqualFn to cover more types (Date, Time, Tim…
gouhongshen 1273892
test: full type coverage for TestNullSafeEqualFn
gouhongshen 257e091
test: achieve 100% type coverage for nullSafeEqualFn and update test …
gouhongshen ffcd200
refactor: optimize opBinaryBytesBytesToFixedNullSafe for performance …
gouhongshen fbb693d
Merge branch 'main' into feat/null_safe_equal
gouhongshen b853ac0
Merge branch 'main' into feat/null_safe_equal
gouhongshen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| # MatrixOne NULL-Safe Equal Operator (`<=>`) Implementation Design Document | ||
|
|
||
| ## 1. Requirement Analysis | ||
|
|
||
| **Feature Request:** [Issue #23009](https://github.com/matrixorigin/matrixone/issues/23009) | ||
|
|
||
| **Core Requirement:** | ||
| Implement the MySQL-compatible NULL-Safe Equal operator `<=>`. This operator compares two expressions and returns `1` (TRUE) if both are equal or both are NULL; otherwise, it returns `0` (FALSE). | ||
|
|
||
| **Behavior Comparison:** | ||
| | Expression A | Expression B | A = B | A <=> B | | ||
| | :--- | :--- | :--- | :--- | | ||
| | `1` | `1` | `1` | `1` | | ||
| | `1` | `0` | `0` | `0` | | ||
| | `1` | `NULL` | `NULL` | `0` | | ||
| | `NULL` | `NULL` | `NULL` | `1` | | ||
|
|
||
| --- | ||
|
|
||
| ## 2. Implementation Design | ||
|
|
||
| ### 2.1 Architectural Hierarchy | ||
|
|
||
| The implementation covers both the Plan layer and the Function Execution layer of the SQL engine: | ||
| 1. **Plan Layer:** Responsible for AST parsing, Tuple expansion, operator binding, and property deduction. | ||
| 2. **Function Layer:** Provides vectorized implementation of the NULL-Safe comparison logic. | ||
|
|
||
| ### 2.2 Detailed Design and Changes | ||
|
|
||
| #### 2.2.1 Plan Layer Support | ||
| * **Tuple Expansion:** In `pkg/sql/plan/base_binder.go`, logic has been implemented to handle Tuple expansion for `tree.NULL_SAFE_EQUAL`. | ||
| * Semantics: `(a, b) <=> (c, d)` is transformed into equivalent logic `(a <=> c) AND (b <=> d)`. | ||
| * Alignment: This behavior is compatible with MySQL and aligns with MO's existing handling logic for the `EQUAL` (`=`) operator. | ||
| * **Optimizer Properties (NotNullable):** | ||
| * In `pkg/sql/plan/function/list_operator.go`, `<=>` is marked with `plan.Function_PRODUCE_NO_NULL`. | ||
| * Effect: The optimizer can deduce that the result of this expression is never NULL, enabling more effective `NOT` pushdown, equivalence deduction, and `IS NULL` constant folding. | ||
| * **Limitations (Hash Join & Zonemap):** | ||
| * **Join:** `<=>` is NOT treated as an equi-join condition ( `IsEqualFunc` is not modified). This is because the underlying Hash Join operator currently may not correctly handle NULL key matching. Enabling it prematurely would result in NULL rows being dropped. Currently, `<=>` executes via Nested Loop Join or Cross Product to ensure correctness. | ||
| * **Zonemap:** `Function_ZONEMAPPABLE` is NOT marked. The default Zonemap evaluation logic is based on Min/Max and ignores NULL value statistics, which could lead to blocks containing NULL values being incorrectly filtered out. | ||
|
|
||
| #### 2.2.2 Execution Layer (Function) Implementation | ||
| * **Non-Strict Mode:** `<=>` does not use the `plan.Function_STRICT` flag because it needs to handle NULL inputs rather than propagating them directly. | ||
| * **Vectorized Implementation:** A new `nullSafeEqualFn` has been added in `pkg/sql/plan/function/func_compare.go`. | ||
| * **Generic Handling:** Uses `opBinaryFixedFixedToFixedNullSafe` for fixed-length types and `opBinaryBytesBytesToFixedNullSafe` for variable-length types. | ||
| * **NULL Handling Logic:** | ||
| * `NULL <=> NULL` -> `1` (True) | ||
| * `Value <=> NULL` -> `0` (False) | ||
| * `Value <=> Value` -> Normal equality comparison logic. | ||
| * **Multi-Type Support:** Core comparison logic covers all major data types supported by MatrixOne, including `BOOL`, `INT`, `FLOAT`, `DECIMAL`, `CHAR/VARCHAR`, `JSON`, `DATE/TIME/TIMESTAMP`, `UUID`, and `ARRAY`. | ||
| * **Result Reset:** Explicitly resets the NULL mask of the result vector to ensure the output is always valid (non-NULL). | ||
|
|
||
| #### 2.2.3 Metadata Registration | ||
| * Defined `NULL_SAFE_EQUAL` (406) in `pkg/sql/plan/function/function_id.go` and updated tests. | ||
| * Exported relevant variables in `pkg/sql/plan/function/init.go` for external reference. | ||
|
|
||
| --- | ||
|
|
||
| ## 3. Testing Plan | ||
|
|
||
| ### 3.1 Coverage Scenarios | ||
| A new BVT test case `test/distributed/cases/function/func_null_safe_equal.sql` has been added, comprehensively covering the following scenarios: | ||
| 1. **Basic Scalar Comparison:** Covering all NULL combinations (`1<=>1`, `1<=>NULL`, `NULL<=>NULL`, etc.). | ||
| 2. **Table Data Comparison:** Behavior of `<=>` with Numeric, String, and Boolean types in tables. | ||
| 3. **Complex Type Comparison:** | ||
| * **JSON:** Supports NULL-safe comparison and Join between JSON columns. | ||
| * **Decimal:** Validates alignment and NULL comparison across different Precision/Scale. | ||
| * **Date/Time/Timestamp:** Validates matching of Date/Time types under various NULL conditions. | ||
| 4. **JOIN Association:** Verifies that when `<=>` is used as a Join condition, NULLs can correctly match NULLs (verifying correctness over performance). | ||
| 5. **Implicit Conversion:** Validates `1.0 <=> 1` and mixed scenarios with numbers and strings. | ||
| 6. **Tuple Support:** Verifies the expansion logic for `(a, b) <=> (c, d)` and scenarios involving complex types like JSON. | ||
| 7. **Optimizer Property Verification:** Verifies that `(val <=> NULL) IS NULL` is correctly optimized to `0` (False). | ||
| 8. **Constant Folding:** Verifies that `WHERE NULL <=> NULL` correctly returns rows. | ||
|
|
||
| ### 3.2 Verification Results | ||
| All test cases have been run and verified against a live MatrixOne environment, with results completely consistent with MySQL behavior. | ||
|
|
||
| --- | ||
|
|
||
| ## 4. Performance Implications and Limitations | ||
|
|
||
| ### 4.1 Why Hash Join is Not Supported | ||
| Currently, MatrixOne's Hash Join operator (based on `HasMap`) defaults to ignoring NULL keys during the build and probe phases (adhering to standard SQL `=` semantics). | ||
| Forcing the enablement of the Hash Join path for `<=>` would cause rows with `NULL <=> NULL` to be directly discarded during the probe phase due to "key is NULL", resulting in incorrect results. | ||
| Therefore, the current implementation falls back to Nested Loop Join or Cross Product + Filter to ensure result correctness. | ||
|
|
||
| ### 4.2 Why Zonemap is Not Supported | ||
| The default Zonemap evaluation logic is primarily based on `[Min, Max]` range coverage. For queries like `col <=> NULL`, checking whether `NullCount > 0` in the Block is required. | ||
| The generic evaluator has not yet implemented this specialized logic for `<=>`. Directly reusing the generic logic might fail comparison between `Min/Max` and NULL, leading to incorrect filtering of Blocks containing NULL data. | ||
| Therefore, the `Function_ZONEMAPPABLE` marker has been removed to force full scanning (or fallback to other indexes) to prevent data loss. | ||
|
|
||
| --- | ||
|
|
||
| ## 5. List of Modified Files | ||
|
|
||
| | File Path | Description of Responsibility | | ||
| | :--- | :--- | | ||
| | `pkg/sql/plan/function/function_id.go` | Define ID and register operator name. | | ||
| | `pkg/sql/plan/function/function_id_test.go` | Update ID test mapping. | | ||
| | `pkg/sql/plan/function/init.go` | Initialize global reference variables. | | ||
| | `pkg/sql/plan/function/list_operator.go` | Register function overloads and mark `PRODUCE_NO_NULL`. | | ||
| | `pkg/sql/plan/base_binder.go` | Implement Tuple expansion logic. | | ||
| | `pkg/sql/plan/function/func_compare.go` | Implement core comparison logic. | | ||
| | `test/distributed/cases/function/func_null_safe_equal.sql` | Test case set. | | ||
| | `test/distributed/cases/function/func_null_safe_equal.result` | Expected execution results. | | ||
|
|
||
| --- | ||
|
|
||
| ## 6. Future Optimization Directions (Open Issues) | ||
|
|
||
| 1. **Support Hash Join:** Refactor the Hash Join operator (`HasMap` and Join Probe logic) to support and correctly handle equality matching for NULL keys. | ||
| 2. **Support Zonemap:** Implement specialized Zonemap evaluation logic for `<=>` to utilize `NullCount` information in ZoneMap for coarse-grained filtering. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.