-
Notifications
You must be signed in to change notification settings - Fork 277
Closed
Labels
EPICenhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
Description
What is the problem the feature request solves?
The QueryPlanSerde.exprToProtoInternal method contains logic for serializing Spark expressions to protocol buffer format and also contains checks that Comet supports the expression. This file has grown very large and is hard to navigate, so we would like to refactor this logic such that the per-expression logic is moved into separate classes.
As an example, here is the original approach for handling the Add expression:
case add @ Add(left, right, _) if supportedDataType(left.dataType) =>
createMathExpression(
expr,
left,
right,
inputs,
binding,
add.dataType,
add.evalMode == EvalMode.ANSI,
(builder, mathExpr) => builder.setAdd(mathExpr))
case add @ Add(left, _, _) if !supportedDataType(left.dataType) =>
withInfo(add, s"Unsupported datatype ${left.dataType}")
NoneThe new approach is to move this into a separate file and class:
object CometAdd extends CometExpressionSerde with MathBase {
override def convert(
expr: Expression,
inputs: Seq[Attribute],
binding: Boolean): Option[ExprOuterClass.Expr] = {
val add = expr.asInstanceOf[Add]
if (!supportedDataType(add.left.dataType)) {
withInfo(add, s"Unsupported datatype ${add.left.dataType}")
return None
}
createMathExpression(
expr,
add.left,
add.right,
inputs,
binding,
add.dataType,
add.evalMode == EvalMode.ANSI,
(builder, mathExpr) => builder.setAdd(mathExpr))
}
}These classes are then referenced from QueryPlanSerde in a map:
private val exprSerdeMap: Map[Class[_], CometExpressionSerde] = Map(
classOf[Add] -> CometAdd,
classOf[Subtract] -> CometSubtract,
classOf[Multiply] -> CometMultiply,
...This approach has some benefits, such as:
- Moving away from all expressions sharing the same logic for determining which data types are supported (different expressions support different types)
- It makes it easier to write unit tests (Implement unit tests for serde logic #2020)
- Once all expressions migrate to the new pattern, it will be easier to automate generating documentation about supported expressions
- It is likely that we will find common patterns and will be able to refactor the code to reduce boilerplate
Describe the potential solution
Convert the following expressions:
- Add
- Subtract
- Multiply
- Divide
- IntegralDivide
- Remainder
- ArrayAppend
- ArrayContains
- ArrayDistinct
- ArrayExcept
- ArrayInsert
- ArrayIntersect
- ArrayJoin
- ArrayMax
- ArrayRemove
- ArrayRepeat
- ArraysOverlap
- ArrayUnion
- CreateArray
- Ascii
- ConcatWs
- Chr
- InitCap
- BitwiseCount
- BitwiseGet
- BitwiseNot
- BitwiseOr
- BitwiseXor
- BitLength
- FromUnixTime
- Length
- Acos
- Cos
- Asin
- Sin
- Atan
- Tan
- Exp
- Expm1
- Sqrt
- Signum
- Md5
- ShiftLeft
- ShiftRight
- StringInstr
- StringRepeat
- StringReplace
- StringTranslate
- StringTrim
- StringTrimLeft
- StringTrimRight
- StringTrimBoth
- Upper
- Lower
- Murmur3Hash
- XxHash64
- MapKeys
- MapValues
- MapFromArrays
- GetMapValue
- GreaterThan
- GreaterThanOrEqual
- LessThan
- LessThanOrEqual
- Substring
- Like
- RLike
- StartsWith
- EndsWith
- Contains
- StringSpace
- Hour
- Minute
- DateAdd
- DateSub
- TruncDate
- TruncTimestamp
- Second
- Year
- IsNull
- IsNotNull
- IsNaN
- Atan2
- Ceil
- Floor
- Log
- Log10
- Log2
- Pow
- Round
- StringDecode
- OctetLength
- Reverse
- BitwiseAnd
- In
- InSet
- StringRPad
- Sha2
- CreateNamedStruct - chore: Refactor serde for more array and struct expressions #2257
- GetStructField - chore: Refactor serde for more array and struct expressions #2257
- GetArrayItem - chore: Refactor serde for more array and struct expressions #2257
- ElementAt - chore: Refactor serde for more array and struct expressions #2257
- GetArrayStructFields - chore: Refactor serde for more array and struct expressions #2257
- StructsToJson - chore: Refactor serde for more array and struct expressions #2257
- ArrayFilter
- ArrayExcept
- Rand
- Randn
- And - chore: Refactor remaining predicate expression serde #2265
- Or - chore: Refactor remaining predicate expression serde #2265
- Not(In) - chore: Refactor remaining predicate expression serde #2265
- Not - chore: Refactor remaining predicate expression serde #2265
- EqualTo - chore: Refactor remaining predicate expression serde #2265
- Not(EqualTo - chore: Refactor remaining predicate expression serde #2265
- EqualNullSafe - chore: Refactor remaining predicate expression serde #2265
- Not(EqualNullSafe - chore: Refactor remaining predicate expression serde #2265
- If - chore: Refactor serde for conditional expressions #2266
- CaseWhen - chore: Refactor serde for conditional expressions #2266
- Alias - chore: Refactor serde for named expressions
aliasandattributeReference#2290 - AttributeReference - chore: Refactor serde for named expressions
aliasandattributeReference#2290 - TryCast - chore: Refactor
Castserde to avoid code duplication #2242 - Cast - chore: Refactor
Castserde to avoid code duplication #2242 - Hex - chore: Refactor
hex/unhexSerDe to avoid code duplication #2287 - Unhex - chore: Refactor
hex/unhexSerDe to avoid code duplication #2287 - Literal - chore: Refactor Literal serde #2377
- SortOrder
- PromotePrecision
- CheckOverflow
- UnaryMinus
- KnownFloatingPointNormalized
- ScalarSubquery
- UnscaledValue
- MakeDecimal
- BloomFilterMightContain
- RegExpReplace
Additional context
No response
Metadata
Metadata
Assignees
Labels
EPICenhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers