Skip to content

Conversation

@cht42
Copy link
Contributor

@cht42 cht42 commented Jan 15, 2026

Which issue does this PR close?

Rationale for this change

The current date_part function in datafusion have a few differences with the spark implementation:

  • day of week parts are 1 indexed in spark but 0 indexed in datafusion
  • spark supports a few more aliases for certain parts

Full list of spark supported aliases: https://github.com/apache/spark/blob/a03bedb6c1281c5263a42bfd20608d2ee005ab05/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L3356-L3371

What changes are included in this PR?

New date_part function in spark crate.

Are these changes tested?

Yes with SLT

Are there any user-facing changes?

yes

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation spark labels Jan 15, 2026
@cht42
Copy link
Contributor Author

cht42 commented Jan 15, 2026

need to merge #19821 first

}
_ => {
return internal_err!(
"First argument of `DATE_PART` must be non-null scalar Utf8"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as DF date_part, part is a literal

"First argument of `DATE_PART` must be non-null scalar Utf8"

use datafusion_expr::planner::{ExprPlanner, PlannerResult};

#[derive(Default, Debug)]
pub struct SparkFunctionPlanner;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're including this planner now, I feel we should update the lib docs with an example of using this

https://github.com/apache/datafusion/blob/main/datafusion/spark/src/lib.rs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I can do that. I also think it would be nice to provide a way to register the expr planner and the udfs at the same time with something like

pub fn with_default_features(mut self) -> Self {
.
we could do a with_spark_features ? could track that in a separate issue/PR

internal_err!("spark date_part should have been simplified to standard date_part")
}

fn simplify(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that we're using simplify here 👍

Comment on lines +126 to +129
let date_part_expr = Expr::ScalarFunction(ScalarFunction::new_udf(
datafusion_functions::datetime::date_part(),
vec![part_expr, date_expr],
));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern is if the nullability of the output field will match here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah you're right, should we update

fn return_field_from_args(&self, args: ReturnFieldArgs) -> Result<FieldRef> {
to be nullable depending on the inputs ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[datafusion-spark] Add date_part function

2 participants