Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 41 additions & 4 deletions datafusion/common/src/scalar/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -568,6 +568,10 @@ impl PartialEq for ScalarValue {
impl PartialOrd for ScalarValue {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
use ScalarValue::*;

if self.is_null() || other.is_null() {
return None;
}
// This purposely doesn't have a catch-all "(_, _)" so that
// any newly added enum variant will require editing this list
// or else face a compile error
Expand Down Expand Up @@ -723,7 +727,7 @@ impl PartialOrd for ScalarValue {
if k1 == k2 { v1.partial_cmp(v2) } else { None }
}
(Dictionary(_, _), _) => None,
(Null, Null) => Some(Ordering::Equal),
// Null is handled by the early return above, but we need this for exhaustiveness
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do something like

(Null, Null) | (Null, _) | (_, Null) => unreachable!("Nulls are already handled before entering this matching arm

to be more defensive

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After investigating this further, I aligned the behavior with existing ScalarValue::partial_cmp semantics:

ScalarValue::partial_cmp continues to return None for any NULL comparison (including NULL vs NULL), consistent with SQL tri-state logic and existing DataFusion tests.

The failure in vector_ord was due to an incorrect test expectation assuming vectors containing NULLs were orderable.

I’ve updated the test to explicitly assert that vector comparison returns None when NULLs are encountered, rather than expecting < to succeed.

Implementing PostgreSQL-style composite ordering (NULL == NULL inside arrays/structs) would require a separate, higher-level comparison strategy and a conscious semantic change. I’ve kept this PR scoped to fixing the inconsistency without altering scalar comparison behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to implement this pg-compatible array comparison in a separate PR, I'll file a ticket afterwards.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming

Agreed on keeping PostgreSQL-style composite ordering as a separate follow-up PR.
I’ll keep an eye on CI and update if anything comes up.

(Null, _) => None,
}
}
Expand Down Expand Up @@ -5760,10 +5764,9 @@ mod tests {
.unwrap(),
Ordering::Less
);
assert_eq!(
assert!(
ScalarValue::try_cmp(&ScalarValue::Int32(None), &ScalarValue::Int32(Some(2)))
.unwrap(),
Ordering::Less
.is_err()
);
assert_starts_with(
ScalarValue::try_cmp(
Expand Down Expand Up @@ -9348,4 +9351,38 @@ mod tests {
]
);
}
#[test]
fn scalar_partial_ordering_nulls() {
use ScalarValue::*;

assert_eq!(
Int32(Some(3)).partial_cmp(&Int32(None)),
None
);

assert_eq!(
Int32(None).partial_cmp(&Int32(Some(3))),
None
);

assert_eq!(
Int32(None).partial_cmp(&Int32(None)),
None
);

assert_eq!(
Null.partial_cmp(&Int32(Some(3))),
None
);

assert_eq!(
Int32(Some(3)).partial_cmp(&Null),
None
);

assert_eq!(
Null.partial_cmp(&Null),
None
);
}
}
12 changes: 8 additions & 4 deletions datafusion/common/src/utils/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1026,22 +1026,26 @@ mod tests {
ScalarValue::Int32(Some(2)),
Null,
ScalarValue::Int32(Some(0)),
] < vec![
]
.partial_cmp(&vec![
ScalarValue::Int32(Some(2)),
Null,
ScalarValue::Int32(Some(1)),
]
])
.is_none()
);
assert!(
vec![
ScalarValue::Int32(Some(2)),
ScalarValue::Int32(None),
ScalarValue::Int32(Some(0)),
] < vec![
]
.partial_cmp(&vec![
ScalarValue::Int32(Some(2)),
ScalarValue::Int32(None),
ScalarValue::Int32(Some(1)),
]
])
.is_none()
);
}

Expand Down
Loading