You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-53578][CONNECT] Simplify data type handling in LiteralValueProtoConverter
### What changes were proposed in this pull request?
This PR simplifies data type handling in the Spark Connect `LiteralValueProtoConverter` by consolidating type information into a single `data_type` field at the root level of the `Literal` message, rather than having separate type fields within nested structures.
**Key changes:**
1. **Protobuf Schema Simplification:**
- Added a new `data_type` field (field 100) to the root `Expression.Literal` message
- Removed redundant type fields from nested messages (`Array.data_type`, `Map.data_type`, `Struct.data_type_struct`)
2. **Array Type Handling Enhancement:**
- Added special handling for `ByteType` arrays to convert them to `Binary` type in the protobuf representation
- This addresses a specific edge case where byte arrays should be represented as binary data
### Why are the changes needed?
The current data type handling in Spark Connect has several issues:
1. **Redundancy and Complexity:** Type information is scattered across multiple fields in nested messages, making the protobuf schema unnecessarily complex and error-prone.
2. **Limited Extensibility:** Without this data_type field, it is difficult to add type information for literal types. For example, it's challenging to include detailed type metadata for types like `String` (with collation information), `YearMonthInterval`, `DayTimeInterval`, and other types that may require additional type-specific attributes.
### Does this PR introduce _any_ user-facing change?
**No.** This is an internal refactoring of the Spark Connect protobuf schema and converter logic.
### How was this patch tested?
`build/sbt "connect/testOnly *LiteralExpressionProtoConverterSuite"`
`SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite"`
`build/sbt "connect/testOnly org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite"`
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.5.11
Closes#52342 from heyihong/SPARK-53578.
Authored-by: Yihong He <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
0 commit comments