Skip to content

Conversation

@paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Oct 17, 2025

Which issue does this PR close?

I am sorry that I missed the previous PR implementing this ( #18120 ) and I'm also happy to review that one instead of updating this!

Rationale for this change

Other systems that interact with the logical plan (e.g., SQL, Substrait) can express types that are not strictly within the arrow DataType enum.

What changes are included in this PR?

For the Cast and TryCast structs, the destination data type was changed from a DataType to a FieldRef.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes, any code using Cast { .. } to create an expression would need to use Cast::new() instead (or pass on field metadata if it has it). Existing matches will need to be upated for the data_type -> field member rename.

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates core Core DataFusion crate substrait Changes to the substrait crate proto Related to proto crate functions Changes to functions implementation labels Oct 17, 2025
Comment on lines 598 to 603
f.as_ref()
.clone()
.with_data_type(data_type.data_type().clone())
.with_metadata(f.metadata().clone())
// TODO: should nullability be overridden here or derived from the
// input expression?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this type of cast should be able to express nullability or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this PR does not consider the nullability of the cast field, because the destination physical expression won't consider it either (and thus the return fields would be out of sync).

Comment on lines 294 to 295
data_type.clone(),
// TODO: this drops extension metadata associated with the cast
data_type.data_type().clone(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually need physical expressions to be able to cast things...my vauge plan is to use a logical plan transformation or perhaps optimizer rule to replace casts to extension types with a ScalarUDF call. This should possibly error if there is mismatched metadata between the input and destination (i.e., a physical cast would only ever represent a storage cast, which is usually OK).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this to error at the logical expr -> physical expr stage if there is metadata on the cast field. I think this is better than dropping it (and won't break any existing code because before this PR such a cast could not exist).

@github-actions github-actions bot added the optimizer Optimizer rules label Oct 25, 2025
@paleolimbot paleolimbot marked this pull request as ready for review October 27, 2025 14:26
@paleolimbot
Copy link
Member Author

@alamb If updating DataTypes to FieldRefs is still what we're doing in the logical plan, this PR is ready for review!

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 8, 2025
@adriangb adriangb self-requested a review December 10, 2025 16:49
@adriangb
Copy link
Contributor

This makes sense in general! I only got partially into the review before realizing it conflicts with #19097. Maybe that one is the issue to focus on?

@paleolimbot
Copy link
Member Author

Thank you for reviewing!

I didn't see that PR but it looks like it focuses on the physical cast. This PR doesn't really have anything to do with the physical cast (in order to actually execute a cast to an extension type we need a registry of how exactly that should be done, which we don't have yet), although I'm sure there's some merge conflict between them.

@adriangb
Copy link
Contributor

Oh right two different layers! But they're both essentially adding Field to the cast operators, which is interesting / why I got confused.

I will try to take another look here tomorrow.

Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me.

Is there an example of actually customizing casting, or an issue tracking getting us there? E.g. in postgres you can do select gen_random_uuid() = 'b2bb3fea-2598-449e-ac64-8ad1754df02d';? As far as I can tell that needs more work but this is a step towards that.

)?);
let data_type = cast.arrow_type.as_ref().required("arrow_type")?;
Ok(Expr::Cast(Cast::new(expr, data_type)))
let field = Field::new("", data_type, cast.nullable.unwrap_or(true));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like there should be an UnnamedField or DataTypeWithNullabilityAndMetadata. I saw similar patterns in the literal expressions.

expressions::cast(
create_physical_expr(expr, input_dfschema, execution_props)?,
input_schema,
field.data_type().clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does end up tying into #19097: I think they'd work well together, we'd just want to pass the field directly here.

Comment on lines -815 to +824
pub data_type: DataType,
pub field: FieldRef,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep data_type as a deprecated field that we populate from field.data_type() for a couple of releases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also implement this by adding a metadata: FieldMetadata field rather than switching the data_type to a FieldRef. We mostly just switched DataType to FieldRef in other places so that's what I did here.

Comment on lines 637 to 643
// This currently propagates the nullability of the input
// expression as the resulting physical expression does
// not currently consider the nullability specified here
f.as_ref()
.clone()
.with_data_type(field.data_type().clone())
.with_metadata(f.metadata().clone())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't fully understand this comment / what's going on here. I'm sure it's my fault but could you try rewording the comment, maybe giving f a more meaningful name and replacing _ with _unused_variable_that_has_a_useful_name?

@paleolimbot
Copy link
Member Author

Is there an example of actually customizing casting, or an issue tracking getting us there?

There's an issue to create a registry for dyn Extensiony things and a POC PR with not much discussion driving a DataFusion-based solution ( #18223 )...I added a comment there on how we might go about that using the registry. This exact PR allows DataFusion-based projects to implement a workaround (for example, in SedonaDB I'm planning to transform the logical plan to rewrite casts to an extension type to a scalar function call to sd_cast() ). Casts show up in SQL -> LogicalPlan internals and SQL types are now FieldRefs too, so my personal next step was to see if we now have enough to add a UUID type in SQL.

@alamb alamb added the api change Changes the API exposed to users of the crate label Dec 17, 2025
@alamb alamb changed the title Allow logical expressions to express a cast to an extension type Add Field to Expr::cast -- allow logical expressions to express a cast to an extension type Dec 17, 2025
@alamb alamb changed the title Add Field to Expr::cast -- allow logical expressions to express a cast to an extension type Add Field to Expr::Cast -- allow logical expressions to express a cast to an extension type Dec 17, 2025
@alamb
Copy link
Contributor

alamb commented Jan 6, 2026

Is this waiting for anything else before we merge it? We made a branch-52 now, so merging to main means it will be included in 53.0.0 (or we can backport to branch-52)

@paleolimbot
Copy link
Member Author

I'm personally happy with this...it seems like there is no opposition to the main change (DataType -> FielRef in Cast and TryCast). It might be nice to get all the Expr breaking changes out of the way in 52 (I believe Variable was also updated similarly) although I am not in a personal rush. The next steps are to fill in the SQL half (SQL types are already field refs but no built in types have extension types yet) and fill in the extension half (cast registry or something).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate core Core DataFusion crate documentation Improvements or additions to documentation functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates proto Related to proto crate sql SQL Planner substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LogicalPlan Casts can't express a cast to an extension type

3 participants