Skip to content

Conversation

@sdf-jkl
Copy link
Contributor

@sdf-jkl sdf-jkl commented Nov 17, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Adding a new rule to expr_simplifier library - udf_preimage

This rule performs the optimization for the following operators:

  • Equal
  • NotEqual
  • Greater
  • GreaterEqual
  • Less
  • LessEqual
  • IsDistinctFrom
  • IsNotDistinctFrom

The rule currently supports optimization for date_part function given 'year' literal is passed as the interval parameter.

Are these changes tested?

Tested for all comparison operators above and all possible datatypes in unit tests and sqllogictests.

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label Nov 17, 2025
@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Nov 17, 2025

@alamb This works as is right now, but I like the idea of adding a new method ScalarUDFImpl::simplify_predicate() as a unified api for all scalar functions. We then can move this code in the DatePartFunc implementation of ScalarUDFImpl and create a simplification rule for scalar functions.

The rule will check if there is a scalar function is present in the predicate expression and match the corresponding ScalarUDFImpl::simplify_predicate() to simplify the expression.

@2010YOUY01 2010YOUY01 self-requested a review November 18, 2025 04:49
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sdf-jkl -- this is looking quite cool

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is great 👍🏼

The implementation idea looks good to me. This should be good to go after the end-to-end tests are added, and also have the test coverage double checked.

@github-actions github-actions bot added logical-expr Logical plan and expressions functions Changes to functions implementation labels Nov 20, 2025
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Nov 21, 2025
@sdf-jkl

This comment was marked as outdated.

@alamb
Copy link
Contributor

alamb commented Dec 9, 2025

BTW if you want my active notes (aka I tried a few things to refine the API) you can see what I was trying in this commit: alamb@49057e9

Copy link
Contributor Author

@sdf-jkl sdf-jkl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb thanks for your review.

I've addressed your comments and ready for another go. We should be very close to closing this.

Comment on lines +244 to +245
pub fn column_expr(&self, args: &[Expr]) -> Option<Expr> {
self.inner.column_expr(args)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a little helper method to ScalarUDFImpl to extract the inner columnar Expr

}
}

fn date_to_scalar(date: NaiveDate, target_type: &DataType) -> Option<ScalarValue> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned this up a little

Comment on lines +1978 to +1979
if get_preimage(&left, &right, info)?.0.is_some()
&& get_preimage(&left, &right, info)?.1.is_some() =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to if let to be stabilized in match guards

};
Ok((
func.preimage(args, right_expr, info)?,
func.column_expr(args),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to add it here, because we already extract the func, args here.

/// [ClickHouse Paper]: https://www.vldb.org/pvldb/vol17/p3731-schulze.pdf
/// [preimage]: https://en.wikipedia.org/wiki/Image_(mathematics)#Inverse_image
///
pub(super) fn rewrite_with_preimage(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a little more comments

@github-actions github-actions bot added the core Core DataFusion crate label Dec 15, 2025
@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Dec 15, 2025

Hey @alamb please check when you're available!

@sdf-jkl sdf-jkl requested a review from alamb December 15, 2025 18:05
@alamb
Copy link
Contributor

alamb commented Dec 16, 2025

Thanks -- I'll try and find some time over the next day or two

@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Dec 19, 2025

@alamb 👀

Copy link
Contributor Author

@sdf-jkl sdf-jkl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated IsDistinctFrom and IsNotDistinctFrom null handling logic and updated tests.

//
// <expr> is not distinct from x ==> (<expr> is NULL and x is NULL) or ((<expr> >= lower) and (<expr> < upper))
// but since x is always not NULL => (<expr> >= lower) and (<expr> < upper)
Operator::Eq | Operator::IsNotDistinctFrom => and(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsNotDistinctFrom acts like Eq, but handles Null differently.
For IsNotDistinctFrom:
Null = Null => True,
While Eq:
Null = Null => False.

Preimage optimization requires an interval to perform this optimization, therefore, rhs can't be Null. We can safely remove the Null handling part.
This makes IsNotDistinctFrom behavior same as Eq

),
// <expr> is distinct from x ==> (<expr> < lower) or (<expr> >= upper) or (<expr> is NULL and x is not NULL) or (<expr> is not NULL and x is NULL)
// but given that x is always not NULL => (<expr> < lower) or (<expr> >= upper) or (<expr> is NULL)
Operator::IsDistinctFrom => Expr::BinaryExpr(BinaryExpr {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, for IsDistinctFrom rhs can't be Null.
We can simplify the Null handling part: (<expr> is NULL and x is not NULL) or (<expr> is not NULL and x is NULL) to (<expr> is NULL).

@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Dec 29, 2025

Hey @alamb, I've made some changes and ready for another go. Please take a look when you have time, thanks!

@alamb
Copy link
Contributor

alamb commented Jan 9, 2026

Sorry for the dealy here @sdf-jkl -- I am trying to find time to review this but I have been out and other things came up that were higher priority. I hope to return to this next week

@alamb
Copy link
Contributor

alamb commented Jan 9, 2026

(it is hard for me to find time to review 1000 line PRs, unfortunately)

@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Jan 9, 2026

@alamb no worries. Maybe I restructure this into some smaller PRs, so it will be easier to review?

@alamb
Copy link
Contributor

alamb commented Jan 9, 2026

@alamb no worries. Maybe I restructure this into some smaller PRs, so it will be easier to review?

That would certainly help me 🙏

@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Jan 10, 2026

@alamb I split it in two:

Hope this will be helpful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support "pre-image" for pruning predicate evaluation

3 participants