Skip to content

Support isDistinctFrom and isNotDistinctFrom operators #3357

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Aug 14, 2025

Conversation

pengpeng-lu
Copy link
Contributor

@pengpeng-lu pengpeng-lu commented May 16, 2025

This resolves #3504.

}
} else if (comparand == null) {
return 1;
return toComparable(fieldValue).compareTo(toComparable(comparand));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original line 237~244 is wrong? If any of the left or right is null, the comparison result should be unknown? Also in the original code, we call: if(value == null) return null before we call this compare method, so the block of if(fieldValue == null) was never reached.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method currently implements full collation for use in contexts where NULL is to be considered just another value. For example, that is how index entries sort with NULLS FIRST. It is still reasonable to use such a function in contexts that implement tri-valued logic by guarding calls, as many callers do today.

It is equally reasonable to have this method only implement comparison between non-null values and to expect all callers to deal with nulls, either sorting them first or producing unknown, as is appropriate to their context. And both contexts are appropriate in the larger system, which must deal sometimes with the "query" view that has unknown and other times with the "storage" view that does not. It might be nice to have a clearer separation so that it is unambiguous which view applies, perhaps by using entirely different currency and not just bare Object.

What this change does is something in between, which doesn't seem right. Specifically, if this does not handle null at all, then it ought not to have @Nullable. The signature should also be changed. I am surprised that the IDE / style checker(s) doesn't flag this for a possible NPE. But otherwise, it's okay provided all callers now check both sides.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I changed the method to compare between non-null values.

@@ -707,28 +713,44 @@ public static Type invertComparisonType(@Nonnull final Comparisons.Type type) {
@Nullable
@SpotBugsSuppressWarnings("NP_BOOLEAN_RETURN_NULL")
public static Boolean evalComparison(@Nonnull Type type, @Nullable Object value, @Nullable Object comparand) {
if (value == null) {
return null;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I mentioned in another comment, because of this is called first, we'll never reach the if(fieldValue == null) block in method compare. I removed this block here, and added to cases below.

Copy link
Contributor Author

@pengpeng-lu pengpeng-lu May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now, if type = TEXT_CONTAINS_ALL, or anything that is not in the switch block, and value == null, this method returns null. Is this the expected behavior? I think we wanted to throw exception in this case?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, throwing for unknown type before checking either side does seem better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 after removing the if (value == null) return null block, unknown type throws exception regardless of input value being null.

@pengpeng-lu pengpeng-lu added the enhancement New feature or request label May 16, 2025
@@ -417,7 +417,7 @@ protected void testParameterComparison(String name, String field, Object val1, C
final Bindings bindings = Bindings.newBuilder()
.set("fooParam", val2)
.build();
if (val1 != null && val2 != null && (type == Comparisons.Type.IN && !(val2 instanceof List) || type.name().startsWith("TEXT_"))) {
if (val1 != null && val2 != null && (type == Comparisons.Type.IN && !(val2 instanceof List)) || type.name().startsWith("TEXT_")) {
Copy link
Contributor Author

@pengpeng-lu pengpeng-lu May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was a bug. Why it worked before: when type = TEXT_CONTAINS_ALL, it gets into assertEquals and eventually, Comparisons.evalComparison, and because value == null, it returns null.

@pengpeng-lu pengpeng-lu requested review from hatyo and removed request for arnaud-lacurie May 30, 2025 15:41
Copy link
Contributor

@hatyo hatyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like there should be some changes added to ExpressionVisitor.java that I do not see in this PR, but I am not sure.

@pengpeng-lu
Copy link
Contributor Author

It feels like there should be some changes added to ExpressionVisitor.java that I do not see in this PR, but I am not sure.

is there any specific queries that you are concerned about? I think this is a feature on the "operator" level, not "expression" level.

@pengpeng-lu pengpeng-lu requested a review from hatyo June 3, 2025 00:10
@arnaud-lacurie arnaud-lacurie requested a review from MMcM June 3, 2025 08:13
@pengpeng-lu pengpeng-lu requested a review from hatyo July 24, 2025 20:27
Copy link
Contributor

@normen662 normen662 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only looked it very much on the surface. Please asked Mike to give you a review on the Comparisons part as that's shared between the planners and runtimes.

}
} else if (comparand == null) {
return 1;
return toComparable(fieldValue).compareTo(toComparable(comparand));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method currently implements full collation for use in contexts where NULL is to be considered just another value. For example, that is how index entries sort with NULLS FIRST. It is still reasonable to use such a function in contexts that implement tri-valued logic by guarding calls, as many callers do today.

It is equally reasonable to have this method only implement comparison between non-null values and to expect all callers to deal with nulls, either sorting them first or producing unknown, as is appropriate to their context. And both contexts are appropriate in the larger system, which must deal sometimes with the "query" view that has unknown and other times with the "storage" view that does not. It might be nice to have a clearer separation so that it is unambiguous which view applies, perhaps by using entirely different currency and not just bare Object.

What this change does is something in between, which doesn't seem right. Specifically, if this does not handle null at all, then it ought not to have @Nullable. The signature should also be changed. I am surprised that the IDE / style checker(s) doesn't flag this for a possible NPE. But otherwise, it's okay provided all callers now check both sides.

private static Boolean compareEquals(Object value, Object comparand) {
if (value == null || comparand == null) {
return null;
private static boolean compareNotDistinctFrom(Object value, Object comparand) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have @Nullable explicitly to be clear what the expectations are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@@ -285,6 +285,9 @@ private static Boolean compareStartsWith(@Nullable Object value, @Nullable Objec
@Nullable
@SpotBugsSuppressWarnings("NP_BOOLEAN_RETURN_NULL")
private static Boolean compareLike(@Nullable Object value, @Nullable Object pattern) {
if (value == null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this want to similarly handle LIKE ? with a NULL binding, rather than getting Illegal pattern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that in sql, where x like a returns null when x or a is null, so I think we should follow that here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the if(value == null) block from here though, because LikeOperatorValue.likeOperation already handles nulls operations.

@@ -707,28 +713,44 @@ public static Type invertComparisonType(@Nonnull final Comparisons.Type type) {
@Nullable
@SpotBugsSuppressWarnings("NP_BOOLEAN_RETURN_NULL")
public static Boolean evalComparison(@Nonnull Type type, @Nullable Object value, @Nullable Object comparand) {
if (value == null) {
return null;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, throwing for unknown type before checking either side does seem better.

} else {
return toComparable(fieldValue).compareTo(toComparable(comparand));
return toClassWithRealEquals(value).equals(toClassWithRealEquals(comparand));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make toClassWithRealEquals also @Nonnull and remove the null check in it, to further ensure that all callers are similarly consistent with the rest of this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@pengpeng-lu pengpeng-lu requested review from MMcM and normen662 July 29, 2025 19:05
* The {@code isDistinctFrom} function.
*/
@AutoService(BuiltInFunction.class)
public static class IsDistinctFromFn extends BuiltInFunction<Value> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add unit tests for this function, as well as NotDistinctFromFn? You can extend the unit tests in BooleanValueTest and add tests for these functions with variations of argument types so you can test the underlying physical operators.

Comment on lines -1078 to +1085
return null;
if (type == Comparisons.Type.IS_DISTINCT_FROM) {
return arg1 != null || arg2 != null;
} else if (type == Comparisons.Type.NOT_DISTINCT_FROM) {
return arg1 == null && arg2 == null;
} else {
return null;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is interesting that we have this early treatment of null values here instead of delegating it to the actual underlying physical operator.

I opened an issue to fix this, as well as introducing NULL type code handling (instead of the deprecated UNKNOWN).
#3526

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I'm not sure about this as well, but putting it in the operators will result in a lot of duplicated code. Fixing it in a separate PR sounds good.

Copy link
Contributor

@hatyo hatyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -894,6 +929,119 @@ private enum BinaryPhysicalOperator {
GT_IDN(Comparisons.Type.GREATER_THAN, Type.TypeCode.UUID, Type.TypeCode.NULL, (l, r) -> null),
GTE_NID(Comparisons.Type.GREATER_THAN_OR_EQUALS, Type.TypeCode.NULL, Type.TypeCode.UUID, (l, r) -> null),
GTE_IDN(Comparisons.Type.GREATER_THAN_OR_EQUALS, Type.TypeCode.UUID, Type.TypeCode.NULL, (l, r) -> null),

IS_DISTINCT_FROM_BU(Comparisons.Type.IS_DISTINCT_FROM, Type.TypeCode.BOOLEAN, Type.TypeCode.NULL, (l, r) -> Objects.nonNull(l)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you adjust the naming for these?
I think you're using the letter U from when those were unknown, but looking at the rest of the file, you'd want to use N for nulls.
For example IS_DISTINCT_FROM_BU -> IS_DISTINCT_FROM_BN

@pengpeng-lu pengpeng-lu requested review from arnaud-lacurie and normen662 and removed request for normen662 August 5, 2025 19:48
@pengpeng-lu pengpeng-lu merged commit e1c387d into FoundationDB:main Aug 14, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement is [not] distinct from predicate
5 participants