You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-23989][SQL] exchange should copy data before non-serialized shuffle
## What changes were proposed in this pull request?
In Spark SQL, we usually reuse the `UnsafeRow` instance and need to copy the data when a place buffers non-serialized objects.
Shuffle may buffer objects if we don't make it to the bypass merge shuffle or unsafe shuffle.
`ShuffleExchangeExec.needToCopyObjectsBeforeShuffle` misses the case that, if `spark.sql.shuffle.partitions` is large enough, we could fail to run unsafe shuffle and go with the non-serialized shuffle.
This bug is very hard to hit since users wouldn't set such a large number of partitions(16 million) for Spark SQL exchange.
TODO: test
## How was this patch tested?
todo.
Author: Wenchen Fan <[email protected]>
Closesapache#21101 from cloud-fan/shuffle.
0 commit comments