Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions docs/rfcs/20260121_null_safe_equal_operator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# MatrixOne NULL-Safe Equal Operator (`<=>`) Implementation Design Document

## 1. Requirement Analysis

**Feature Request:** [Issue #23009](https://github.com/matrixorigin/matrixone/issues/23009)

**Core Requirement:**
Implement the MySQL-compatible NULL-Safe Equal operator `<=>`. This operator compares two expressions and returns `1` (TRUE) if both are equal or both are NULL; otherwise, it returns `0` (FALSE).

**Behavior Comparison:**
| Expression A | Expression B | A = B | A <=> B |
| :--- | :--- | :--- | :--- |
| `1` | `1` | `1` | `1` |
| `1` | `0` | `0` | `0` |
| `1` | `NULL` | `NULL` | `0` |
| `NULL` | `NULL` | `NULL` | `1` |

---

## 2. Implementation Design

### 2.1 Architectural Hierarchy

The implementation covers both the Plan layer and the Function Execution layer of the SQL engine:
1. **Plan Layer:** Responsible for AST parsing, Tuple expansion, operator binding, and property deduction.
2. **Function Layer:** Provides vectorized implementation of the NULL-Safe comparison logic.

### 2.2 Detailed Design and Changes

#### 2.2.1 Plan Layer Support
* **Tuple Expansion:** In `pkg/sql/plan/base_binder.go`, logic has been implemented to handle Tuple expansion for `tree.NULL_SAFE_EQUAL`.
* Semantics: `(a, b) <=> (c, d)` is transformed into equivalent logic `(a <=> c) AND (b <=> d)`.
* Alignment: This behavior is compatible with MySQL and aligns with MO's existing handling logic for the `EQUAL` (`=`) operator.
* **Optimizer Properties (NotNullable):**
* In `pkg/sql/plan/function/list_operator.go`, `<=>` is marked with `plan.Function_PRODUCE_NO_NULL`.
* Effect: The optimizer can deduce that the result of this expression is never NULL, enabling more effective `NOT` pushdown, equivalence deduction, and `IS NULL` constant folding.
* **Limitations (Hash Join & Zonemap):**
* **Join:** `<=>` is NOT treated as an equi-join condition ( `IsEqualFunc` is not modified). This is because the underlying Hash Join operator currently may not correctly handle NULL key matching. Enabling it prematurely would result in NULL rows being dropped. Currently, `<=>` executes via Nested Loop Join or Cross Product to ensure correctness.
* **Zonemap:** `Function_ZONEMAPPABLE` is NOT marked. The default Zonemap evaluation logic is based on Min/Max and ignores NULL value statistics, which could lead to blocks containing NULL values being incorrectly filtered out.

#### 2.2.2 Execution Layer (Function) Implementation
* **Non-Strict Mode:** `<=>` does not use the `plan.Function_STRICT` flag because it needs to handle NULL inputs rather than propagating them directly.
* **Vectorized Implementation:** A new `nullSafeEqualFn` has been added in `pkg/sql/plan/function/func_compare.go`.
* **Generic Handling:** Uses `opBinaryFixedFixedToFixedNullSafe` for fixed-length types and `opBinaryBytesBytesToFixedNullSafe` for variable-length types.
* **NULL Handling Logic:**
* `NULL <=> NULL` -> `1` (True)
* `Value <=> NULL` -> `0` (False)
* `Value <=> Value` -> Normal equality comparison logic.
* **Multi-Type Support:** Core comparison logic covers all major data types supported by MatrixOne, including `BOOL`, `INT`, `FLOAT`, `DECIMAL`, `CHAR/VARCHAR`, `JSON`, `DATE/TIME/TIMESTAMP`, `UUID`, and `ARRAY`.
* **Result Reset:** Explicitly resets the NULL mask of the result vector to ensure the output is always valid (non-NULL).

#### 2.2.3 Metadata Registration
* Defined `NULL_SAFE_EQUAL` (406) in `pkg/sql/plan/function/function_id.go` and updated tests.
* Exported relevant variables in `pkg/sql/plan/function/init.go` for external reference.

---

## 3. Testing Plan

### 3.1 Coverage Scenarios
A new BVT test case `test/distributed/cases/function/func_null_safe_equal.sql` has been added, comprehensively covering the following scenarios:
1. **Basic Scalar Comparison:** Covering all NULL combinations (`1<=>1`, `1<=>NULL`, `NULL<=>NULL`, etc.).
2. **Table Data Comparison:** Behavior of `<=>` with Numeric, String, and Boolean types in tables.
3. **Complex Type Comparison:**
* **JSON:** Supports NULL-safe comparison and Join between JSON columns.
* **Decimal:** Validates alignment and NULL comparison across different Precision/Scale.
* **Date/Time/Timestamp:** Validates matching of Date/Time types under various NULL conditions.
4. **JOIN Association:** Verifies that when `<=>` is used as a Join condition, NULLs can correctly match NULLs (verifying correctness over performance).
5. **Implicit Conversion:** Validates `1.0 <=> 1` and mixed scenarios with numbers and strings.
6. **Tuple Support:** Verifies the expansion logic for `(a, b) <=> (c, d)` and scenarios involving complex types like JSON.
7. **Optimizer Property Verification:** Verifies that `(val <=> NULL) IS NULL` is correctly optimized to `0` (False).
8. **Constant Folding:** Verifies that `WHERE NULL <=> NULL` correctly returns rows.

### 3.2 Verification Results
All test cases have been run and verified against a live MatrixOne environment, with results completely consistent with MySQL behavior.

---

## 4. Performance Implications and Limitations

### 4.1 Why Hash Join is Not Supported
Currently, MatrixOne's Hash Join operator (based on `HasMap`) defaults to ignoring NULL keys during the build and probe phases (adhering to standard SQL `=` semantics).
Forcing the enablement of the Hash Join path for `<=>` would cause rows with `NULL <=> NULL` to be directly discarded during the probe phase due to "key is NULL", resulting in incorrect results.
Therefore, the current implementation falls back to Nested Loop Join or Cross Product + Filter to ensure result correctness.

### 4.2 Why Zonemap is Not Supported
The default Zonemap evaluation logic is primarily based on `[Min, Max]` range coverage. For queries like `col <=> NULL`, checking whether `NullCount > 0` in the Block is required.
The generic evaluator has not yet implemented this specialized logic for `<=>`. Directly reusing the generic logic might fail comparison between `Min/Max` and NULL, leading to incorrect filtering of Blocks containing NULL data.
Therefore, the `Function_ZONEMAPPABLE` marker has been removed to force full scanning (or fallback to other indexes) to prevent data loss.

---

## 5. List of Modified Files

| File Path | Description of Responsibility |
| :--- | :--- |
| `pkg/sql/plan/function/function_id.go` | Define ID and register operator name. |
| `pkg/sql/plan/function/function_id_test.go` | Update ID test mapping. |
| `pkg/sql/plan/function/init.go` | Initialize global reference variables. |
| `pkg/sql/plan/function/list_operator.go` | Register function overloads and mark `PRODUCE_NO_NULL`. |
| `pkg/sql/plan/base_binder.go` | Implement Tuple expansion logic. |
| `pkg/sql/plan/function/func_compare.go` | Implement core comparison logic. |
| `test/distributed/cases/function/func_null_safe_equal.sql` | Test case set. |
| `test/distributed/cases/function/func_null_safe_equal.result` | Expected execution results. |

---

## 6. Future Optimization Directions (Open Issues)

1. **Support Hash Join:** Refactor the Hash Join operator (`HasMap` and Join Probe logic) to support and correctly handle equality matching for NULL keys.
2. **Support Zonemap:** Implement specialized Zonemap evaluation logic for `<=>` to utilize `NullCount` information in ZoneMap for coarse-grained filtering.
31 changes: 31 additions & 0 deletions pkg/sql/plan/base_binder.go
Original file line number Diff line number Diff line change
Expand Up @@ -628,6 +628,37 @@ func (b *baseBinder) bindComparisonExpr(astExpr *tree.ComparisonExpr, depth int3
}
}

case tree.NULL_SAFE_EQUAL:
op = "<=>"
switch leftexpr := astExpr.Left.(type) {
case *tree.Tuple:
switch rightexpr := astExpr.Right.(type) {
case *tree.Tuple:
if len(leftexpr.Exprs) == len(rightexpr.Exprs) {
var expr1, expr2 *plan.Expr
var err error
for i := 1; i < len(leftexpr.Exprs); i++ {
if i == 1 {
expr1, err = b.bindFuncExprImplByAstExpr(op, []tree.Expr{leftexpr.Exprs[0], rightexpr.Exprs[0]}, depth)
if err != nil {
return nil, err
}
}
expr2, err = b.bindFuncExprImplByAstExpr(op, []tree.Expr{leftexpr.Exprs[i], rightexpr.Exprs[i]}, depth)
if err != nil {
return nil, err
}
expr1, err = BindFuncExprImplByPlanExpr(b.GetContext(), "and", []*plan.Expr{expr1, expr2})
if err != nil {
return nil, err
}
}
return expr1, nil
} else {
return nil, moerr.NewInvalidInputf(b.GetContext(), "two tuples have different length(%v,%v)", len(leftexpr.Exprs), len(rightexpr.Exprs))
}
}
}
case tree.LESS_THAN:
op = "<"
switch leftexpr := astExpr.Left.(type) {
Expand Down
187 changes: 187 additions & 0 deletions pkg/sql/plan/function/func_compare.go
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,193 @@ func equalAndNotEqualOperatorSupports(typ1, typ2 types.Type) bool {
return true
}

func opBinaryFixedFixedToFixedNullSafe[T types.FixedSizeTExceptStrType](
parameters []*vector.Vector,
result vector.FunctionResultWrapper,
_ *process.Process,
length int,
cmpFn func(v1, v2 T) bool,
selectList *FunctionSelectList,
) error {
result.UseOptFunctionParamFrame(2)
rs := vector.MustFunctionResult[bool](result)
p1 := vector.OptGetParamFromWrapper[T](rs, 0, parameters[0])
p2 := vector.OptGetParamFromWrapper[T](rs, 1, parameters[1])
rsVec := rs.GetResultVector()
rss := vector.MustFixedColNoTypeCheck[bool](rsVec)

// Result of <=> is never NULL
rsVec.GetNulls().Reset()

for i := uint64(0); i < uint64(length); i++ {
v1, null1 := p1.GetValue(i)
v2, null2 := p2.GetValue(i)

if null1 && null2 {
rss[i] = true
} else if null1 || null2 {
rss[i] = false
} else {
rss[i] = cmpFn(v1, v2)
}
}
return nil
}

func opBinaryBytesBytesToFixedNullSafe(
parameters []*vector.Vector,
result vector.FunctionResultWrapper,
_ *process.Process,
length int,
cmpFn func(v1, v2 []byte) bool,
selectList *FunctionSelectList,
) error {
p1 := vector.GenerateFunctionStrParameter(parameters[0])
p2 := vector.GenerateFunctionStrParameter(parameters[1])
rs := vector.MustFunctionResult[bool](result)
rsVec := rs.GetResultVector()
rss := vector.MustFixedColNoTypeCheck[bool](rsVec)

// Result of <=> is never NULL
rsVec.GetNulls().Reset()

for i := uint64(0); i < uint64(length); i++ {
v1, null1 := p1.GetStrValue(i)
v2, null2 := p2.GetStrValue(i)

if null1 && null2 {
rss[i] = true
} else if null1 || null2 {
rss[i] = false
} else {
rss[i] = cmpFn(v1, v2)
}
}
return nil
}

func nullSafeEqualFn(parameters []*vector.Vector, result vector.FunctionResultWrapper, proc *process.Process, length int, selectList *FunctionSelectList) error {
paramType := parameters[0].GetType()
rs := vector.MustFunctionResult[bool](result)

switch paramType.Oid {
case types.T_bool:
return opBinaryFixedFixedToFixedNullSafe[bool](parameters, rs, proc, length, func(a, b bool) bool {
return a == b
}, selectList)
case types.T_bit:
return opBinaryFixedFixedToFixedNullSafe[uint64](parameters, rs, proc, length, func(a, b uint64) bool {
return a == b
}, selectList)
case types.T_int8:
return opBinaryFixedFixedToFixedNullSafe[int8](parameters, rs, proc, length, func(a, b int8) bool {
return a == b
}, selectList)
case types.T_int16:
return opBinaryFixedFixedToFixedNullSafe[int16](parameters, rs, proc, length, func(a, b int16) bool {
return a == b
}, selectList)
case types.T_int32:
return opBinaryFixedFixedToFixedNullSafe[int32](parameters, rs, proc, length, func(a, b int32) bool {
return a == b
}, selectList)
case types.T_int64:
return opBinaryFixedFixedToFixedNullSafe[int64](parameters, rs, proc, length, func(a, b int64) bool {
return a == b
}, selectList)
case types.T_uint8:
return opBinaryFixedFixedToFixedNullSafe[uint8](parameters, rs, proc, length, func(a, b uint8) bool {
return a == b
}, selectList)
case types.T_uint16:
return opBinaryFixedFixedToFixedNullSafe[uint16](parameters, rs, proc, length, func(a, b uint16) bool {
return a == b
}, selectList)
case types.T_uint32:
return opBinaryFixedFixedToFixedNullSafe[uint32](parameters, rs, proc, length, func(a, b uint32) bool {
return a == b
}, selectList)
case types.T_uint64:
return opBinaryFixedFixedToFixedNullSafe[uint64](parameters, rs, proc, length, func(a, b uint64) bool {
return a == b
}, selectList)
case types.T_uuid:
return opBinaryFixedFixedToFixedNullSafe[types.Uuid](parameters, rs, proc, length, func(a, b types.Uuid) bool {
return a == b
}, selectList)
case types.T_float32:
scale := paramType.Scale
if scale > 0 {
pow := math.Pow10(int(scale))
return opBinaryFixedFixedToFixedNullSafe[float32](parameters, rs, proc, length, func(a, b float32) bool {
a = float32(math.Round(float64(a)*pow) / pow)
b = float32(math.Round(float64(b)*pow) / pow)
return a == b
}, selectList)
}
return opBinaryFixedFixedToFixedNullSafe[float32](parameters, rs, proc, length, func(a, b float32) bool {
return a == b
}, selectList)
case types.T_float64:
return opBinaryFixedFixedToFixedNullSafe[float64](parameters, rs, proc, length, func(a, b float64) bool {
return a == b
}, selectList)
case types.T_char, types.T_varchar, types.T_blob, types.T_json, types.T_text, types.T_binary, types.T_varbinary, types.T_datalink:
return opBinaryBytesBytesToFixedNullSafe(parameters, rs, proc, length, func(a, b []byte) bool {
return bytes.Equal(a, b)
}, selectList)
case types.T_array_float32:
return opBinaryBytesBytesToFixedNullSafe(parameters, rs, proc, length, func(v1, v2 []byte) bool {
_v1 := types.BytesToArray[float32](v1)
_v2 := types.BytesToArray[float32](v2)
return types.ArrayCompare[float32](_v1, _v2) == 0
}, selectList)
case types.T_array_float64:
return opBinaryBytesBytesToFixedNullSafe(parameters, rs, proc, length, func(v1, v2 []byte) bool {
_v1 := types.BytesToArray[float64](v1)
_v2 := types.BytesToArray[float64](v2)
return types.ArrayCompare[float64](_v1, _v2) == 0
}, selectList)
case types.T_date:
return opBinaryFixedFixedToFixedNullSafe[types.Date](parameters, rs, proc, length, func(a, b types.Date) bool {
return a == b
}, selectList)
case types.T_datetime:
return opBinaryFixedFixedToFixedNullSafe[types.Datetime](parameters, rs, proc, length, func(a, b types.Datetime) bool {
return a == b
}, selectList)
case types.T_time:
return opBinaryFixedFixedToFixedNullSafe[types.Time](parameters, rs, proc, length, func(a, b types.Time) bool {
return a == b
}, selectList)
case types.T_timestamp:
return opBinaryFixedFixedToFixedNullSafe[types.Timestamp](parameters, rs, proc, length, func(a, b types.Timestamp) bool {
return a == b
}, selectList)
case types.T_decimal64:
return opBinaryFixedFixedToFixedNullSafe[types.Decimal64](parameters, rs, proc, length, func(a, b types.Decimal64) bool {
return a == b
}, selectList)
case types.T_decimal128:
return opBinaryFixedFixedToFixedNullSafe[types.Decimal128](parameters, rs, proc, length, func(a, b types.Decimal128) bool {
return a == b
}, selectList)
case types.T_Rowid:
return opBinaryFixedFixedToFixedNullSafe[types.Rowid](parameters, rs, proc, length, func(a, b types.Rowid) bool {
return a.EQ(&b)
}, selectList)
case types.T_enum:
return opBinaryFixedFixedToFixedNullSafe[types.Enum](parameters, rs, proc, length, func(a, b types.Enum) bool {
return a == b
}, selectList)
case types.T_year:
return opBinaryFixedFixedToFixedNullSafe[types.MoYear](parameters, rs, proc, length, func(a, b types.MoYear) bool {
return a == b
}, selectList)
}
panic("unreached code")
}

// should convert to c.Numeric next.
func equalFn(parameters []*vector.Vector, result vector.FunctionResultWrapper, proc *process.Process, length int, selectList *FunctionSelectList) error {
paramType := parameters[0].GetType()
Expand Down
Loading
Loading