Skip to content

Conversation

@Weijun-H
Copy link
Member

@Weijun-H Weijun-H commented Oct 17, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules substrait Changes to the substrait crate proto Related to proto crate functions Changes to functions implementation labels Oct 17, 2025
@Weijun-H Weijun-H force-pushed the 18060-fieldref-in-cast branch from 1598f94 to 2a06d44 Compare October 17, 2025 11:59
@github-actions github-actions bot added the core Core DataFusion crate label Oct 17, 2025
@Weijun-H Weijun-H changed the title WIP feat: Replace DataType with FieldRef in Cast WIP feat: Replace DataType with FieldRef in Cast Oct 17, 2025
@Weijun-H Weijun-H added the api change Changes the API exposed to users of the crate label Oct 17, 2025
@Weijun-H Weijun-H force-pushed the 18060-fieldref-in-cast branch from 40f0012 to bd29587 Compare October 17, 2025 12:22
@github-actions github-actions bot added the datasource Changes to the datasource crate label Oct 17, 2025
@paleolimbot
Copy link
Member

Apologies...I missed this PR! Feel free to lift anything useful from #18136 (I'm just as happy to review this instead).

@Weijun-H Weijun-H force-pushed the 18060-fieldref-in-cast branch from 6563998 to 7b32f01 Compare October 27, 2025 09:57
@paleolimbot
Copy link
Member

(I did clean up #18136 because I had some time on Friday but also feel free to keep going here and I'm happy to review!)

@Weijun-H Weijun-H marked this pull request as ready for review October 27, 2025 16:44
@Weijun-H
Copy link
Member Author

(I did clean up #18136 because I had some time on Friday but also feel free to keep going here and I'm happy to review!)

Hi @paleolimbot , this pr is ready for review

@Weijun-H Weijun-H changed the title WIP feat: Replace DataType with FieldRef in Cast feat: Replace DataType with FieldRef in Cast Oct 27, 2025
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I have a few high-level concerns but am happy to iterate. Another option is to pursue #18136 and implement the nullability handling here (which I didn't implement) as a follow-on!

  • The DataTypeExt::into_nullable_field_ref() I think will help eliminate the need for the Arc::new(Field::new("", ...)) changes.
  • I think that the cast() and Cast::new() signatures should probably remain unchanged to minimize disruption in this PR and downstream
  • Considering the nullability of the cast field in the logical expression is problematic (highighted by the CI failures here) because all existing casts simply propagate the input expression nullability. I think this is what most people mean by a cast and so we should keep that behaviour. If we were to express a cast with explicit nullability we would have to do something like Option<bool> (where None means propagate the input, true means explicitly cast to nullable, and false means explicitly cast to non-nullable). All of those would require integration with the physical expression implementing the cast to ensure output schema consistency (which is the reason for the CI failure I believe)

Expr::Cast(Cast { expr, data_type }) => {
write!(f, "CAST({expr} AS {data_type})")
Expr::Cast(Cast { expr, field }) => {
write!(f, "CAST({expr} AS {})", field.data_type())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's now a format_type_and_metadata() helper in datafusion_common that will ensure that casts with metadata are displayed. That display is a bit verbose at the moment when metadata exists but it makes it quite a bit easier to track down bugs where the metadata was dropped.

let expr = Expr::Cast(Cast {
expr: Box::new(Expr::Literal(ScalarValue::Float32(Some(1.23)), None)),
data_type: DataType::Utf8,
field: Arc::new(Field::new("cast", DataType::Utf8, false)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
field: Arc::new(Field::new("cast", DataType::Utf8, false)),
field: DataType::Utf8.into_nullable_field_ref(),

@alamb kindly added a helper for this (there are many places in this PR that would benefit!)

An aside is that I think we should ensure that these fields are always unnamed for consistency (and that the names of such a field are never used).

Comment on lines +329 to +332
/// Types that can be used to describe the result of a cast expression.
pub trait IntoCastField {
fn into_cast_field(self, expr: &Expr) -> FieldRef;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are somewhat similar to the FieldExt traits that Andrew added to datafusion_common/src/datatypes.rs. I think DataFusion would benefit if it had a way to say "get me a FieldRef that is actually a data type", but I am also not sure this PR is the right place to do that.

Comment on lines -330 to +381
pub fn cast(expr: Expr, data_type: DataType) -> Expr {
Expr::Cast(Cast::new(Box::new(expr), data_type))
pub fn cast<F>(expr: Expr, field: F) -> Expr
where
F: IntoCastField,
{
let field = field.into_cast_field(&expr);
Expr::Cast(Cast::new(Box::new(expr), field))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that a number of references to cast() were updated as cast(Arc::new(Field::new(...)))...I wonder if those can be reverted (in theory IntoCastField should handle the conversion?)

Comment on lines +584 to +588
let (_, input_field) = expr.to_field(schema)?;
let mut combined_metadata = FieldMetadata::from(input_field.metadata());
combined_metadata.extend(FieldMetadata::from(field.metadata()));
let field = combined_metadata.add_to_field(field.as_ref().clone());
Ok(Arc::new(field))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to combine metadata here...this would mean that a Cast for an extension type would basically be a "reinterpret cast" not a "logical cast", which is what I believe the Cast is trying to represent in a logical plan. The Alias can currently add metadata in this way which I think is a bit fishy but is at least not attempting to represent a logical operation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think many of these changes should be reverted (I don't think this file is generated?)

Comment on lines -595 to +543
datafusion_common.ArrowType arrow_type = 2;
datafusion_common.Field field = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also possible to add metadata + nullability here to keep the protobuf backward compatible (not a requirement but is relatively easy to do)

Comment on lines -803 to +804
pub fn new(expr: Box<Expr>, data_type: DataType) -> Self {
Self { expr, data_type }
pub fn new(expr: Box<Expr>, field: FieldRef) -> Self {
Self { expr, field }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change (and the corresponding change for TryCast::new() seem to have had caused significant disruption...it is probably worth keeping the original signature and adding a new function specifically for casts with metadata and/or nullability to minimize the disruption in DataFusion and downstream.

@github-actions
Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Dec 30, 2025
@github-actions github-actions bot closed this Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate core Core DataFusion crate datasource Changes to the datasource crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates proto Related to proto crate sql SQL Planner Stale PR has not had any activity for some time substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LogicalPlan Casts can't express a cast to an extension type

2 participants