Skip to content

Conversation

paleolimbot
Copy link
Member

Which issue does this PR close?

Rationale for this change

Most logical plan expressions now propagate metadata; however, parameters with extension types or other field metadata cannot participate in placeholder/parameter binding.

What changes are included in this PR?

The DataType in the Placeholder struct was replaced with a FieldRef.

Are these changes tested?

They will be! (Work in progress)

Are there any user-facing changes?

Yes, one new function was added to extract the placeholder fields from a plan.

This is a breaking change for code that specifically interacts with the Placeholder struct (but matches on the logical Expr I think are unchanged).

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions proto Related to proto crate physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate common Related to common crate labels Oct 8, 2025
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Oct 10, 2025
Copy link
Member Author

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timsaucer @alamb I'm happy to add tests for all these components but wanted to make sure this is vaugely headed in the right direction before I do so!

Comment on lines -1373 to +1145
pub data_type: Option<DataType>,
pub field: Option<FieldRef>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main change. We can change this to a less severely breaking option (e.g., just add metadata: FieldMetadata to the struct)...I started with the most breaking version to identify its use in as many places as possible.

Comment on lines -2015 to +2016
pub(crate) data_types: Vec<DataType>,
pub(crate) data_types: Vec<FieldRef>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another change that had some impact

Comment on lines 256 to +259
pub struct PlannerContext {
/// Data types for numbered parameters ($1, $2, etc), if supplied
/// in `PREPARE` statement
prepare_param_data_types: Arc<Vec<DataType>>,
prepare_param_data_types: Arc<Vec<FieldRef>>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to also do SQL here while I was at it...I could probably isolate these changes into a different PR

Comment on lines -590 to +592
pub(crate) fn convert_data_type(&self, sql_type: &SQLDataType) -> Result<DataType> {
pub(crate) fn convert_data_type(&self, sql_type: &SQLDataType) -> Result<FieldRef> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change would also enable supporting UUIDs and other SQL types that map to extension types

temporary,
name,
return_type,
return_type: return_type.map(|f| f.data_type().clone()),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the places where metadata is dropped that I didn't update (DdlStatement::CreateFunction args or return type)

Comment on lines +989 to +992
Ok(Expr::Cast(Cast::new(
Box::new(expr),
dt.data_type().clone(),
)))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another place where metadata is dropped that I didn't update (casts)

Comment on lines +1525 to +1528
// This check is possibly too strict (requires nullability and field
// metadata align perfectly, rather than compute true type equality
// when field metadata is representing an extension type)
if prev != field {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This highlights something we'll have to fix: how to compute type equality (e.g., is a shredded and unshredded variant the same type?)

plan,
@r#"
Prepare: "my_plan" [Int32]
Prepare: "my_plan" [Field { name: "", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This highlights something else we'll have to solve: how to print types. Printing a field is not particularly helpful in this context. (If this change is vaguely in the right direction I'll revert the changes in this file and implement the Debug or DisplayAs trait or wherever these strings are coming from for now).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also something @friendlymatthew is likely to run into shortly as he is working on variant support too

Comment on lines +54 to +55
#[derive(Clone, PartialEq, Eq, PartialOrd, Hash, Debug)]
pub struct FieldMetadata {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just moved this from the expr crate so I could us it in ParamValues

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @paleolimbot -- I think this is definitely the right direction. I had some small API suggestions, but overall looks good

The largest open question in my mind is what you have highlighted for customizing behavior for different extension types (e.g. comparing two fields for "equality" and printing them, and casting them, etc.)

@findepi brought up the same thing many months ago when discussing adding new types in

One idea is to create a TypeRegistry similar to a FunctionRegistry and some sort of ExtensionType trait that encapsulates these behaviors.

The challenge would then be to thread the registry to all places that need it. Though that is likely largely an API design / plumbing exercise

If you think that is an idea worth exploring

impl ParamValues {
/// Verify parameter list length and type
pub fn verify(&self, expect: &[DataType]) -> Result<()> {
pub fn verify(&self, expect: &[FieldRef]) -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing that would be nice to help people upgrade could be to add a new function and deprecate this one -- perhaps something like suggested in https://datafusion.apache.org/contributor-guide/api-health.html#api-health-policy

    #[deprecated]
    pub fn verify(&self, expect: &[DataType]) -> Result<()> {
      // make dummy Fields
      let expect = ...;
      self.verify_fields(&expect)
     }

    // new function that has the new signature
    pub fn verify_fields(&self, expect: &[FieldRef]) -> Result<()> {
    ...
    }

}

if let Some(expected_metadata) = maybe_metadata {
// Probably too strict of a comparison (this is an example of where
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree straight up comparing strings is probably not ideal

If we wanted to introduce type equality, I thing the bigger question is how to thread it through (you would have to have some way to register your types / methods to check equality and ensure that somehow ended up here 🤔 )

plan,
@r#"
Prepare: "my_plan" [Int32]
Prepare: "my_plan" [Field { name: "", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also something @friendlymatthew is likely to run into shortly as he is working on variant support too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates proto Related to proto crate sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logical plan placeholders can't contain metadata

2 participants