Skip to content

Commit 2afc713

Browse files
ivosoncloud-fan
authored andcommitted
[SPARK-54830][CORE] Enable checksum based indeterminate shuffle retry by default
### What changes were proposed in this pull request? Enable checksum based indeterminate shuffle retry by default. Increase jvm memory size to 6g for `sql` module tests, as test case [SPARK-48037: Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data](https://github.com/apache/spark/blob/316322cbcb55ff5c1b4e479bc2aae12babdae534/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala#L2696) set shuffle partition as `16777216` which will need more memory for computing order independent shuffle checksum. ### Why are the changes needed? As checksum based solution is more accurate to detect indeterminate shuffle output changes, propose to enable it by default to avoid query correctness issues caused by indeterminate shuffle retry. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UTs. ### Was this patch authored or co-authored using generative AI tooling? No Closes #53574 from ivoson/SPARK-54556-followup. Authored-by: Tengfei Huang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
1 parent 8cc2e46 commit 2afc713

File tree

2 files changed

+5
-2
lines changed

2 files changed

+5
-2
lines changed

project/SparkBuild.scala

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1322,6 +1322,9 @@ object SqlApi {
13221322
object SQL {
13231323
import BuildCommons.protoVersion
13241324
lazy val settings = Seq(
1325+
// SPARK-54830: avoid AdaptiveQueryExecSuite OOM, since computing order independent shuffle checksum needs more
1326+
// memory for test case introduced by SPARK-48037 which set shuffle partition to 16777216
1327+
(Test / javaOptions) += "-Xmx6g",
13251328
// Setting version for the protobuf compiler. This has to be propagated to every sub-project
13261329
// even if the project is not using it.
13271330
PB.protocVersion := BuildCommons.protoVersion,

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -907,15 +907,15 @@ object SQLConf {
907907
"retry all tasks of the consumer stages to avoid correctness issues.")
908908
.version("4.1.0")
909909
.booleanConf
910-
.createWithDefault(false)
910+
.createWithDefault(true)
911911

912912
private[spark] val SHUFFLE_CHECKSUM_MISMATCH_FULL_RETRY_ENABLED =
913913
buildConf("spark.sql.shuffle.orderIndependentChecksum.enableFullRetryOnMismatch")
914914
.doc("Whether to retry all tasks of a consumer stage when we detect checksum mismatches " +
915915
"with its producer stages.")
916916
.version("4.1.0")
917917
.booleanConf
918-
.createWithDefault(false)
918+
.createWithDefault(true)
919919

920920
val SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE =
921921
buildConf("spark.sql.adaptive.shuffle.targetPostShuffleInputSize")

0 commit comments

Comments
 (0)