-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Enable placeholders with extension types #17986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
97a8408
to
04cebe1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timsaucer @alamb I'm happy to add tests for all these components but wanted to make sure this is vaugely headed in the right direction before I do so!
pub data_type: Option<DataType>, | ||
pub field: Option<FieldRef>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main change. We can change this to a less severely breaking option (e.g., just add metadata: FieldMetadata
to the struct)...I started with the most breaking version to identify its use in as many places as possible.
pub(crate) data_types: Vec<DataType>, | ||
pub(crate) data_types: Vec<FieldRef>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another change that had some impact
pub struct PlannerContext { | ||
/// Data types for numbered parameters ($1, $2, etc), if supplied | ||
/// in `PREPARE` statement | ||
prepare_param_data_types: Arc<Vec<DataType>>, | ||
prepare_param_data_types: Arc<Vec<FieldRef>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose to also do SQL here while I was at it...I could probably isolate these changes into a different PR
pub(crate) fn convert_data_type(&self, sql_type: &SQLDataType) -> Result<DataType> { | ||
pub(crate) fn convert_data_type(&self, sql_type: &SQLDataType) -> Result<FieldRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change would also enable supporting UUIDs and other SQL types that map to extension types
temporary, | ||
name, | ||
return_type, | ||
return_type: return_type.map(|f| f.data_type().clone()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the places where metadata is dropped that I didn't update (DdlStatement::CreateFunction
args or return type)
Ok(Expr::Cast(Cast::new( | ||
Box::new(expr), | ||
dt.data_type().clone(), | ||
))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another place where metadata is dropped that I didn't update (casts)
// This check is possibly too strict (requires nullability and field | ||
// metadata align perfectly, rather than compute true type equality | ||
// when field metadata is representing an extension type) | ||
if prev != field { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This highlights something we'll have to fix: how to compute type equality (e.g., is a shredded and unshredded variant the same type?)
plan, | ||
@r#" | ||
Prepare: "my_plan" [Int32] | ||
Prepare: "my_plan" [Field { name: "", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This highlights something else we'll have to solve: how to print types. Printing a field is not particularly helpful in this context. (If this change is vaguely in the right direction I'll revert the changes in this file and implement the Debug or DisplayAs trait or wherever these strings are coming from for now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also something @friendlymatthew is likely to run into shortly as he is working on variant support too
#[derive(Clone, PartialEq, Eq, PartialOrd, Hash, Debug)] | ||
pub struct FieldMetadata { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just moved this from the expr crate so I could us it in ParamValues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @paleolimbot -- I think this is definitely the right direction. I had some small API suggestions, but overall looks good
The largest open question in my mind is what you have highlighted for customizing behavior for different extension types (e.g. comparing two fields for "equality" and printing them, and casting them, etc.)
@findepi brought up the same thing many months ago when discussing adding new types in
One idea is to create a TypeRegistry
similar to a FunctionRegistry
and some sort of ExtensionType
trait that encapsulates these behaviors.
The challenge would then be to thread the registry to all places that need it. Though that is likely largely an API design / plumbing exercise
If you think that is an idea worth exploring
impl ParamValues { | ||
/// Verify parameter list length and type | ||
pub fn verify(&self, expect: &[DataType]) -> Result<()> { | ||
pub fn verify(&self, expect: &[FieldRef]) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing that would be nice to help people upgrade could be to add a new function and deprecate this one -- perhaps something like suggested in https://datafusion.apache.org/contributor-guide/api-health.html#api-health-policy
#[deprecated]
pub fn verify(&self, expect: &[DataType]) -> Result<()> {
// make dummy Fields
let expect = ...;
self.verify_fields(&expect)
}
// new function that has the new signature
pub fn verify_fields(&self, expect: &[FieldRef]) -> Result<()> {
...
}
} | ||
|
||
if let Some(expected_metadata) = maybe_metadata { | ||
// Probably too strict of a comparison (this is an example of where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree straight up comparing strings is probably not ideal
If we wanted to introduce type equality, I thing the bigger question is how to thread it through (you would have to have some way to register your types / methods to check equality and ensure that somehow ended up here 🤔 )
plan, | ||
@r#" | ||
Prepare: "my_plan" [Int32] | ||
Prepare: "my_plan" [Field { name: "", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also something @friendlymatthew is likely to run into shortly as he is working on variant support too
Which issue does this PR close?
Rationale for this change
Most logical plan expressions now propagate metadata; however, parameters with extension types or other field metadata cannot participate in placeholder/parameter binding.
What changes are included in this PR?
The DataType in the Placeholder struct was replaced with a FieldRef.
Are these changes tested?
They will be! (Work in progress)
Are there any user-facing changes?
Yes, one new function was added to extract the placeholder fields from a plan.
This is a breaking change for code that specifically interacts with the Placeholder struct (but matches on the logical Expr I think are unchanged).