Commit 1d09e5a
[SPARK-52449][CONNECT][PYTHON][ML] Make datatypes for Expression.Literal.Map/Array optional
### What changes were proposed in this pull request?
This PR optimizes the `LiteralValueProtoConverter` to reduce redundant type information in Spark Connect protocol buffers. The key changes include:
1. **Optimized type inference for arrays and maps**: Modified the conversion logic to only include type information in the first element of arrays and the first key-value pair of maps, since subsequent elements can infer their types from the first element.
2. **Added `needDataType` parameter**: Introduced a new parameter to control when type information is necessary, allowing the converter to skip redundant type information.
3. **Updated protobuf documentation**: Enhanced comments in the protobuf definitions to clarify that only the first element needs to contain type information for inference.
4. **Improved test coverage**: Added new test cases for complex nested structures including tuples and maps with array values.
### Why are the changes needed?
The current implementation includes type information for every element in arrays and every key-value pair in maps, which is redundant and increases the size of protocol buffer messages. Since Spark Connect can infer types from the first element, including type information for subsequent elements is unnecessary and wastes bandwidth and processing time.
### Does this PR introduce any user-facing change?
**No** - This PR does not introduce any user-facing changes.
The change is backward compatible and existing connect clients will continue to work unchanged.
### How was this patch tested?
`build/sbt "connect/testOnly *LiteralExpressionProtoConverterSuite"`
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.4.5
Closes #51473 from heyihong/SPARK-52449.
Authored-by: Yihong He <heyihong.cn@gmail.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>1 parent fb0c4b8 commit 1d09e5a
File tree
14 files changed
+422
-504
lines changed- python/pyspark/sql/connect/proto
- sql/connect
- client/jvm/src/test/scala/org/apache/spark/sql
- common/src
- main
- protobuf/spark/connect
- scala/org/apache/spark/sql/connect/common
- test/resources/query-tests
- explain-results
- queries
- server/src
- main/scala/org/apache/spark/sql/connect/ml
- test/scala/org/apache/spark/sql/connect/planner
14 files changed
+422
-504
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
479 | 479 | | |
480 | 480 | | |
481 | 481 | | |
482 | | - | |
483 | | - | |
| 482 | + | |
484 | 483 | | |
485 | 484 | | |
486 | 485 | | |
487 | 486 | | |
488 | 487 | | |
489 | 488 | | |
490 | 489 | | |
491 | | - | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
492 | 495 | | |
493 | 496 | | |
494 | | - | |
| 497 | + | |
495 | 498 | | |
496 | 499 | | |
497 | | - | |
498 | | - | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
499 | 503 | | |
500 | 504 | | |
501 | 505 | | |
| |||
534 | 538 | | |
535 | 539 | | |
536 | 540 | | |
537 | | - | |
538 | | - | |
| 541 | + | |
539 | 542 | | |
540 | 543 | | |
541 | 544 | | |
| |||
550 | 553 | | |
551 | 554 | | |
552 | 555 | | |
553 | | - | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
554 | 561 | | |
555 | 562 | | |
556 | 563 | | |
557 | 564 | | |
558 | 565 | | |
559 | 566 | | |
560 | | - | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
561 | 572 | | |
562 | 573 | | |
563 | | - | |
| 574 | + | |
564 | 575 | | |
565 | 576 | | |
566 | 577 | | |
| 578 | + | |
567 | 579 | | |
568 | 580 | | |
569 | 581 | | |
| |||
608 | 620 | | |
609 | 621 | | |
610 | 622 | | |
611 | | - | |
612 | | - | |
| 623 | + | |
613 | 624 | | |
614 | 625 | | |
615 | 626 | | |
| |||
620 | 631 | | |
621 | 632 | | |
622 | 633 | | |
623 | | - | |
| 634 | + | |
624 | 635 | | |
625 | 636 | | |
626 | 637 | | |
| |||
Lines changed: 5 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3419 | 3419 | | |
3420 | 3420 | | |
3421 | 3421 | | |
| 3422 | + | |
| 3423 | + | |
| 3424 | + | |
| 3425 | + | |
| 3426 | + | |
3422 | 3427 | | |
3423 | 3428 | | |
3424 | 3429 | | |
| |||
Lines changed: 19 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
217 | 217 | | |
218 | 218 | | |
219 | 219 | | |
220 | | - | |
221 | | - | |
| 220 | + | |
222 | 221 | | |
223 | 222 | | |
224 | 223 | | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
225 | 227 | | |
226 | 228 | | |
227 | | - | |
| 229 | + | |
228 | 230 | | |
229 | 231 | | |
230 | | - | |
231 | | - | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
232 | 235 | | |
233 | 236 | | |
234 | 237 | | |
235 | 238 | | |
236 | 239 | | |
237 | 240 | | |
238 | | - | |
239 | | - | |
| 241 | + | |
240 | 242 | | |
241 | 243 | | |
242 | 244 | | |
| |||
246 | 248 | | |
247 | 249 | | |
248 | 250 | | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
249 | 254 | | |
250 | 255 | | |
251 | 256 | | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
252 | 260 | | |
253 | 261 | | |
254 | | - | |
| 262 | + | |
255 | 263 | | |
256 | 264 | | |
257 | 265 | | |
| 266 | + | |
258 | 267 | | |
259 | 268 | | |
260 | 269 | | |
| |||
263 | 272 | | |
264 | 273 | | |
265 | 274 | | |
266 | | - | |
267 | | - | |
| 275 | + | |
268 | 276 | | |
269 | 277 | | |
270 | 278 | | |
271 | 279 | | |
272 | 280 | | |
273 | | - | |
| 281 | + | |
274 | 282 | | |
275 | 283 | | |
276 | 284 | | |
| |||
0 commit comments