-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Support "pre-image" for pruning predicate evaluation #18789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@alamb This works as is right now, but I like the idea of adding a new method The rule will check if there is a scalar function is present in the predicate expression and match the corresponding |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sdf-jkl -- this is looking quite cool
2010YOUY01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is great 👍🏼
The implementation idea looks good to me. This should be good to go after the end-to-end tests are added, and also have the test coverage double checked.
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs
Outdated
Show resolved
Hide resolved
datafusion/optimizer/src/simplify_expressions/unwrap_date_part.rs
Outdated
Show resolved
Hide resolved
datafusion/optimizer/src/simplify_expressions/unwrap_date_part.rs
Outdated
Show resolved
Hide resolved
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs
Outdated
Show resolved
Hide resolved
This comment was marked as outdated.
This comment was marked as outdated.
|
BTW if you want my active notes (aka I tried a few things to refine the API) you can see what I was trying in this commit: alamb@49057e9 |
…sion into pre-image-support
sdf-jkl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb thanks for your review.
I've addressed your comments and ready for another go. We should be very close to closing this.
| pub fn column_expr(&self, args: &[Expr]) -> Option<Expr> { | ||
| self.inner.column_expr(args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a little helper method to ScalarUDFImpl to extract the inner columnar Expr
| } | ||
| } | ||
|
|
||
| fn date_to_scalar(date: NaiveDate, target_type: &DataType) -> Option<ScalarValue> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned this up a little
| if get_preimage(&left, &right, info)?.0.is_some() | ||
| && get_preimage(&left, &right, info)?.1.is_some() => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward to if let to be stabilized in match guards
| }; | ||
| Ok(( | ||
| func.preimage(args, right_expr, info)?, | ||
| func.column_expr(args), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to add it here, because we already extract the func, args here.
| /// [ClickHouse Paper]: https://www.vldb.org/pvldb/vol17/p3731-schulze.pdf | ||
| /// [preimage]: https://en.wikipedia.org/wiki/Image_(mathematics)#Inverse_image | ||
| /// | ||
| pub(super) fn rewrite_with_preimage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a little more comments
|
Hey @alamb please check when you're available! |
|
Thanks -- I'll try and find some time over the next day or two |
|
@alamb 👀 |
sdf-jkl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated IsDistinctFrom and IsNotDistinctFrom null handling logic and updated tests.
| // | ||
| // <expr> is not distinct from x ==> (<expr> is NULL and x is NULL) or ((<expr> >= lower) and (<expr> < upper)) | ||
| // but since x is always not NULL => (<expr> >= lower) and (<expr> < upper) | ||
| Operator::Eq | Operator::IsNotDistinctFrom => and( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsNotDistinctFrom acts like Eq, but handles Null differently.
For IsNotDistinctFrom:
Null = Null => True,
While Eq:
Null = Null => False.
Preimage optimization requires an interval to perform this optimization, therefore, rhs can't be Null. We can safely remove the Null handling part.
This makes IsNotDistinctFrom behavior same as Eq
| ), | ||
| // <expr> is distinct from x ==> (<expr> < lower) or (<expr> >= upper) or (<expr> is NULL and x is not NULL) or (<expr> is not NULL and x is NULL) | ||
| // but given that x is always not NULL => (<expr> < lower) or (<expr> >= upper) or (<expr> is NULL) | ||
| Operator::IsDistinctFrom => Expr::BinaryExpr(BinaryExpr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, for IsDistinctFrom rhs can't be Null.
We can simplify the Null handling part: (<expr> is NULL and x is not NULL) or (<expr> is not NULL and x is NULL) to (<expr> is NULL).
|
Hey @alamb, I've made some changes and ready for another go. Please take a look when you have time, thanks! |
|
Sorry for the dealy here @sdf-jkl -- I am trying to find time to review this but I have been out and other things came up that were higher priority. I hope to return to this next week |
|
(it is hard for me to find time to review 1000 line PRs, unfortunately) |
|
@alamb no worries. Maybe I restructure this into some smaller PRs, so it will be easier to review? |
That would certainly help me 🙏 |
|
@alamb I split it in two:
Hope this will be helpful |
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Adding a new rule to expr_simplifier library -
udf_preimageThis rule performs the optimization for the following operators:
The rule currently supports optimization for
date_partfunction given 'year' literal is passed as the interval parameter.Are these changes tested?
Tested for all comparison operators above and all possible datatypes in unit tests and sqllogictests.
Are there any user-facing changes?