Skip to content

[EPIC] Refactor all expression serde logic out of QueryPlanSerde #2019

@andygrove

Description

@andygrove

What is the problem the feature request solves?

The QueryPlanSerde.exprToProtoInternal method contains logic for serializing Spark expressions to protocol buffer format and also contains checks that Comet supports the expression. This file has grown very large and is hard to navigate, so we would like to refactor this logic such that the per-expression logic is moved into separate classes.

As an example, here is the original approach for handling the Add expression:

      case add @ Add(left, right, _) if supportedDataType(left.dataType) =>
        createMathExpression(
          expr,
          left,
          right,
          inputs,
          binding,
          add.dataType,
          add.evalMode == EvalMode.ANSI,
          (builder, mathExpr) => builder.setAdd(mathExpr))

      case add @ Add(left, _, _) if !supportedDataType(left.dataType) =>
        withInfo(add, s"Unsupported datatype ${left.dataType}")
        None

The new approach is to move this into a separate file and class:

object CometAdd extends CometExpressionSerde with MathBase {
  override def convert(
      expr: Expression,
      inputs: Seq[Attribute],
      binding: Boolean): Option[ExprOuterClass.Expr] = {
    val add = expr.asInstanceOf[Add]
    if (!supportedDataType(add.left.dataType)) {
      withInfo(add, s"Unsupported datatype ${add.left.dataType}")
      return None
    }
    createMathExpression(
      expr,
      add.left,
      add.right,
      inputs,
      binding,
      add.dataType,
      add.evalMode == EvalMode.ANSI,
      (builder, mathExpr) => builder.setAdd(mathExpr))
  }
}

These classes are then referenced from QueryPlanSerde in a map:

  private val exprSerdeMap: Map[Class[_], CometExpressionSerde] = Map(
    classOf[Add] -> CometAdd,
    classOf[Subtract] -> CometSubtract,
    classOf[Multiply] -> CometMultiply,
    ...

This approach has some benefits, such as:

  • Moving away from all expressions sharing the same logic for determining which data types are supported (different expressions support different types)
  • It makes it easier to write unit tests (Implement unit tests for serde logic #2020)
  • Once all expressions migrate to the new pattern, it will be easier to automate generating documentation about supported expressions
  • It is likely that we will find common patterns and will be able to refactor the code to reduce boilerplate

Describe the potential solution

Convert the following expressions:

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions