[SPARK-52724][SQL] Enhance broadcast join OOM error handling with SHUFFLE_MERGE hint support

Emma-82 · wangyum · commit a5a3b7ce2209 · 2025-07-12T01:08:57.000+08:00
### What changes were proposed in this pull request? This PR enhances broadcast join OOM error handling by use shuffle sort merge join(`SHUFFLE_MERGE`) hint as a workaround for broadcast join OOM issues. ### Why are the changes needed? Reduce support workload by involving customers to diagnose and resolve broadcast join OOM issues independently. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test: ``` Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value or analyze these tables through: `ANALYZE TABLE `t1` COMPUTE STATISTICS; ANALYZE TABLE `t2` COMPUTE STATISTICS;` or apply the shuffle sort merge join hint as described in the Spark documentation: https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-hints.html#join-hints. Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value or apply the shuffle sort merge join hint as described in the Spark documentation: https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-hints.html#join-hints. ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #51417 from Emma-82/SPARK-52724. Authored-by: emmazhang <zr18817562793@163.com> Signed-off-by: Yuming Wang <yumwang@ebay.com>
diff --git a/common/utils/src/main/resources/error/error-conditions.json b/common/utils/src/main/resources/error/error-conditions.json
@@ -8688,7 +8688,7 @@
   },
   "_LEGACY_ERROR_TEMP_2250" : {
     "message" : [
-      "Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting <autoBroadcastJoinThreshold> to -1 or increase the spark driver memory by setting <driverMemory> to a higher value<analyzeTblMsg>"
+      "Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting <autoBroadcastJoinThreshold> to -1 or increase the spark driver memory by setting <driverMemory> to a higher value<analyzeTblMsg> or apply the shuffle sort merge join hint as described in the Spark documentation: https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-hints.html#join-hints."
     ]
   },
   "_LEGACY_ERROR_TEMP_2251" : {
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -2106,9 +2106,9 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase with ExecutionE
       oe: OutOfMemoryError, tables: Seq[TableIdentifier]): Throwable = {
     val analyzeTblMsg = if (tables.nonEmpty) {
       " or analyze these tables through: " +
-        s"${tables.map(t => s"ANALYZE TABLE $t COMPUTE STATISTICS;").mkString(" ")}."
+        s"`${tables.map(t => s"ANALYZE TABLE $t COMPUTE STATISTICS;").mkString(" ")}`"
     } else {
-      "."
+      ""
     }
     new SparkException(
       errorClass = "_LEGACY_ERROR_TEMP_2250",

Original file line number	Diff line number	Diff line change
`@@ -8688,7 +8688,7 @@`
`8688`	`8688`	`},`
`8689`	`8689`	`"_LEGACY_ERROR_TEMP_2250" : {`
`8690`	`8690`	`"message" : [`
`8691`		`- "Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting <autoBroadcastJoinThreshold> to -1 or increase the spark driver memory by setting <driverMemory> to a higher value<analyzeTblMsg>"`
	`8691`	`+ "Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting <autoBroadcastJoinThreshold> to -1 or increase the spark driver memory by setting <driverMemory> to a higher value<analyzeTblMsg> or apply the shuffle sort merge join hint as described in the Spark documentation: https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-hints.html#join-hints."`
`8692`	`8692`	`]`
`8693`	`8693`	`},`
`8694`	`8694`	`"_LEGACY_ERROR_TEMP_2251" : {`