-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Spark date part #19823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Spark date part #19823
Conversation
|
need to merge #19821 first |
| } | ||
| _ => { | ||
| return internal_err!( | ||
| "First argument of `DATE_PART` must be non-null scalar Utf8" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as DF date_part, part is a literal
| "First argument of `DATE_PART` must be non-null scalar Utf8" |
| use datafusion_expr::planner::{ExprPlanner, PlannerResult}; | ||
|
|
||
| #[derive(Default, Debug)] | ||
| pub struct SparkFunctionPlanner; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're including this planner now, I feel we should update the lib docs with an example of using this
https://github.com/apache/datafusion/blob/main/datafusion/spark/src/lib.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I can do that. I also think it would be nice to provide a way to register the expr planner and the udfs at the same time with something like
| pub fn with_default_features(mut self) -> Self { |
we could do a
with_spark_features ? could track that in a separate issue/PR
| internal_err!("spark date_part should have been simplified to standard date_part") | ||
| } | ||
|
|
||
| fn simplify( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that we're using simplify here 👍
| let date_part_expr = Expr::ScalarFunction(ScalarFunction::new_udf( | ||
| datafusion_functions::datetime::date_part(), | ||
| vec![part_expr, date_expr], | ||
| )); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One concern is if the nullability of the output field will match here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah you're right, should we update
| fn return_field_from_args(&self, args: ReturnFieldArgs) -> Result<FieldRef> { |
Which issue does this PR close?
datafusion-sparkSpark Compatible Functions #15914Rationale for this change
The current date_part function in datafusion have a few differences with the spark implementation:
Full list of spark supported aliases: https://github.com/apache/spark/blob/a03bedb6c1281c5263a42bfd20608d2ee005ab05/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L3356-L3371
What changes are included in this PR?
New date_part function in spark crate.
Are these changes tested?
Yes with SLT
Are there any user-facing changes?
yes