Skip to content

Conversation

codetyri0n
Copy link
Contributor

@codetyri0n codetyri0n commented Aug 26, 2025

Which issue does this PR close?

Closes one of the requirements of #1044 .

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Brought in unit tests validating the functionality.

Copy link
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, @codetyri0n! Would you add some end-to-end tests please, and confirm if we're falling back currently or otherwise have some Spark tests turned off in the diffs?

}
}

fn compare_strings(left: &str, right: &str, op: CompareOp) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need custom comparator logic? This seems like something Arrow kernels could handle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this makes sense to me as well, will do so... overlooked this and hard coded it out as I was not very familiar with Arrow capabilities.

}

// Parse the lambda expression
if lambda_expr.contains(" >= ") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this parsing sufficient? How complex can they be? I'm surprised they're coming out of Spark as strings and not parsed already into some sort of expression.

Copy link
Contributor Author

@codetyri0n codetyri0n Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would need to take a longer look at this and come to a conclusion/approach - my initial plan was to handle the standard operators and maybe bring in enhancements in the next set of patches. Was also curious how we would tackle 'AND', 'OR', etc but again I think this can be handled with datafusion? Would love to hear your thoughts and also some direction on this.

Comment on lines +1528 to +1532
case mapFilter : MapFilter =>
val mapExpr = exprToProtoInternal(mapFilter.input, inputs)
val lambdaExpr = exprToProtoInternal(mapFilter.function, inputs)
val optExpr = scalarFunctionExprToProtoWithReturnType("map_filter", mapFilter.dataType, mapExpr, lambdaExpr)
optExprWithInfo(optExpr, expr, mapFilter.input, mapFilter.function)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't needed because the serde logic is already implemented in CometMapFilter

inputs: Seq[Attribute],
binding: Boolean): Option[ExprOuterClass.Expr] = {
val mapExpr = exprToProtoInternal(expr.argument, inputs, binding)
val lambdaExpr = exprToProtoInternal(expr.function, inputs, binding)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need something like CometLambdaFunction expression which will be equivalent of Spark LambdaFunction expression defined here ?

}
}

fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue, DataFusionError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use create_comet_physical_fun , this can avoid boilerplates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants