-
Notifications
You must be signed in to change notification settings - Fork 234
Feat : Bringing in support for map_filter expression. #2236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution, @codetyri0n! Would you add some end-to-end tests please, and confirm if we're falling back currently or otherwise have some Spark tests turned off in the diffs?
} | ||
} | ||
|
||
fn compare_strings(left: &str, right: &str, op: CompareOp) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need custom comparator logic? This seems like something Arrow kernels could handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this makes sense to me as well, will do so... overlooked this and hard coded it out as I was not very familiar with Arrow capabilities.
} | ||
|
||
// Parse the lambda expression | ||
if lambda_expr.contains(" >= ") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this parsing sufficient? How complex can they be? I'm surprised they're coming out of Spark as strings and not parsed already into some sort of expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would need to take a longer look at this and come to a conclusion/approach - my initial plan was to handle the standard operators and maybe bring in enhancements in the next set of patches. Was also curious how we would tackle 'AND', 'OR', etc but again I think this can be handled with datafusion? Would love to hear your thoughts and also some direction on this.
case mapFilter : MapFilter => | ||
val mapExpr = exprToProtoInternal(mapFilter.input, inputs) | ||
val lambdaExpr = exprToProtoInternal(mapFilter.function, inputs) | ||
val optExpr = scalarFunctionExprToProtoWithReturnType("map_filter", mapFilter.dataType, mapExpr, lambdaExpr) | ||
optExprWithInfo(optExpr, expr, mapFilter.input, mapFilter.function) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't needed because the serde logic is already implemented in CometMapFilter
inputs: Seq[Attribute], | ||
binding: Boolean): Option[ExprOuterClass.Expr] = { | ||
val mapExpr = exprToProtoInternal(expr.argument, inputs, binding) | ||
val lambdaExpr = exprToProtoInternal(expr.function, inputs, binding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need something like CometLambdaFunction
expression which will be equivalent of Spark LambdaFunction expression defined here ?
} | ||
} | ||
|
||
fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue, DataFusionError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to use create_comet_physical_fun , this can avoid boilerplates?
Which issue does this PR close?
Closes one of the requirements of #1044 .
Rationale for this change
What changes are included in this PR?
How are these changes tested?
Brought in unit tests validating the functionality.