-
Notifications
You must be signed in to change notification settings - Fork 234
Open
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
Description
What is the problem the feature request solves?
The QueryPlanSerde.exprToProtoInternal
method contains logic for serializing Spark expressions to protocol buffer format and also contains checks that Comet supports the expression. This file has grown very large and is hard to navigate, so we would like to refactor this logic such that the per-expression logic is moved into separate classes.
As an example, here is the original approach for handling the Add
expression:
case add @ Add(left, right, _) if supportedDataType(left.dataType) =>
createMathExpression(
expr,
left,
right,
inputs,
binding,
add.dataType,
add.evalMode == EvalMode.ANSI,
(builder, mathExpr) => builder.setAdd(mathExpr))
case add @ Add(left, _, _) if !supportedDataType(left.dataType) =>
withInfo(add, s"Unsupported datatype ${left.dataType}")
None
The new approach is to move this into a separate file and class:
object CometAdd extends CometExpressionSerde with MathBase {
override def convert(
expr: Expression,
inputs: Seq[Attribute],
binding: Boolean): Option[ExprOuterClass.Expr] = {
val add = expr.asInstanceOf[Add]
if (!supportedDataType(add.left.dataType)) {
withInfo(add, s"Unsupported datatype ${add.left.dataType}")
return None
}
createMathExpression(
expr,
add.left,
add.right,
inputs,
binding,
add.dataType,
add.evalMode == EvalMode.ANSI,
(builder, mathExpr) => builder.setAdd(mathExpr))
}
}
These classes are then referenced from QueryPlanSerde in a map:
private val exprSerdeMap: Map[Class[_], CometExpressionSerde] = Map(
classOf[Add] -> CometAdd,
classOf[Subtract] -> CometSubtract,
classOf[Multiply] -> CometMultiply,
...
This approach has some benefits, such as:
- Moving away from all expressions sharing the same logic for determining which data types are supported (different expressions support different types)
- It makes it easier to write unit tests (Implement unit tests for serde logic #2020)
- Once all expressions migrate to the new pattern, it will be easier to automate generating documentation about supported expressions
- It is likely that we will find common patterns and will be able to refactor the code to reduce boilerplate
Describe the potential solution
Convert the following expressions:
- Add
- Subtract
- Multiply
- Divide
- IntegralDivide
- Remainder
- ArrayAppend
- ArrayContains
- ArrayDistinct
- ArrayExcept
- ArrayInsert
- ArrayIntersect
- ArrayJoin
- ArrayMax
- ArrayRemove
- ArrayRepeat
- ArraysOverlap
- ArrayUnion
- CreateArray
- Ascii
- ConcatWs
- Chr
- InitCap
- BitwiseCount
- BitwiseGet
- BitwiseNot
- BitwiseOr
- BitwiseXor
- BitLength
- FromUnixTime
- Length
- Acos
- Cos
- Asin
- Sin
- Atan
- Tan
- Exp
- Expm1
- Sqrt
- Signum
- Md5
- ShiftLeft
- ShiftRight
- StringInstr
- StringRepeat
- StringReplace
- StringTranslate
- StringTrim
- StringTrimLeft
- StringTrimRight
- StringTrimBoth
- Upper
- Lower
- Murmur3Hash
- XxHash64
- MapKeys
- MapValues
- MapFromArrays
- GetMapValue
- GreaterThan
- GreaterThanOrEqual
- LessThan
- LessThanOrEqual
- Substring
- Like
- RLike
- StartsWith
- EndsWith
- Contains
- StringSpace
- Hour
- Minute
- DateAdd
- DateSub
- TruncDate
- TruncTimestamp
- Second
- Year
- IsNull
- IsNotNull
- IsNaN
- Atan2
- Ceil
- Floor
- Log
- Log10
- Log2
- Pow
- Round
- StringDecode
- OctetLength
- Reverse
- BitwiseAnd
- In
- InSet
- StringRPad
- Sha2
- CreateNamedStruct - chore: Refactor serde for more array and struct expressions #2257
- GetStructField - chore: Refactor serde for more array and struct expressions #2257
- GetArrayItem - chore: Refactor serde for more array and struct expressions #2257
- ElementAt - chore: Refactor serde for more array and struct expressions #2257
- GetArrayStructFields - chore: Refactor serde for more array and struct expressions #2257
- StructsToJson - chore: Refactor serde for more array and struct expressions #2257
- ArrayFilter
- ArrayExcept
- Rand
- Randn
- And - chore: Refactor remaining predicate expression serde #2265
- Or - chore: Refactor remaining predicate expression serde #2265
- Not(In) - chore: Refactor remaining predicate expression serde #2265
- Not - chore: Refactor remaining predicate expression serde #2265
- EqualTo - chore: Refactor remaining predicate expression serde #2265
- Not(EqualTo - chore: Refactor remaining predicate expression serde #2265
- EqualNullSafe - chore: Refactor remaining predicate expression serde #2265
- Not(EqualNullSafe - chore: Refactor remaining predicate expression serde #2265
- If - chore: Refactor serde for conditional expressions #2266
- CaseWhen - chore: Refactor serde for conditional expressions #2266
- Alias
- AttributeReference
- TryCast - chore: Refactor
Cast
serde to avoid code duplication #2242 - Cast - chore: Refactor
Cast
serde to avoid code duplication #2242 - Literal
- Hex
- Unhex
- SortOrder
- PromotePrecision
- CheckOverflow
- UnaryMinus
- KnownFloatingPointNormalized
- ScalarSubquery
- UnscaledValue
- MakeDecimal
- BloomFilterMightContain
- RegExpReplace
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers