[SPARK-54850][SQL] Improve extractShuffleIds to find AdaptiveSparkPlanExec anywhere in plan tree

baibaichen · yaooqinn · commit 082de7fcdc40 · 2025-12-29T21:11:24.000+08:00
### What changes were proposed in this pull request? This PR uses `collectFirst` to find the first `AdaptiveSparkPlanExec` node anywhere in the plan tree, instead of assuming the root plan is an `AdaptiveSparkPlanExec`. ### Why are the changes needed? #52157 introduced the `extractShuffleIds` method in `SQLExecution` to find shuffle IDs of `SparkPlan`. Previously, the method implicitly assumed that if AQE is enabled, the `AdaptiveSparkPlanExec` would be at the root of the input. Since Spark only inserts `AdaptiveSparkPlanExec` under Command, this assumption was fine. However, the `AdaptiveSparkPlanExec` may not be the root node in Gluten. Gluten needs to insert a special physical plan to do column to row transition. By using `collectFirst`, we can correctly locate the `AdaptiveSparkPlanExec` regardless of its position in the plan tree, which improves compatibility. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass GHA. ### Was this patch authored or co-authored using generative AI tooling? No Closes #53620 from baibaichen/feature/extractShuffleIds. Authored-by: Chang chen <baibaichen@gmail.com> Signed-off-by: Kent Yao <kentyao@microsoft.com>
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala
@@ -71,11 +71,12 @@ object SQLExecution extends Logging {
   }
 
   private def extractShuffleIds(plan: SparkPlan): Seq[Int] = {
-    plan match {
+    val shuffleIdsOption = plan.collectFirst {
       case ae: AdaptiveSparkPlanExec =>
         ae.context.shuffleIds.asScala.keys.toSeq
-      case nonAdaptivePlan =>
-        nonAdaptivePlan.collect {
+    }
+    shuffleIdsOption.getOrElse {
+        plan.collect {
           case exec: ShuffleExchangeLike => exec.shuffleId
         }
     }

Original file line number	Diff line number	Diff line change
`@@ -71,11 +71,12 @@ object SQLExecution extends Logging {`
`71`	`71`	`}`
`72`	`72`
`73`	`73`	`private def extractShuffleIds(plan: SparkPlan): Seq[Int] = {`
`74`		`- plan match {`
	`74`	`+ val shuffleIdsOption = plan.collectFirst {`
`75`	`75`	`case ae: AdaptiveSparkPlanExec =>`
`76`	`76`	`ae.context.shuffleIds.asScala.keys.toSeq`
`77`		`- case nonAdaptivePlan =>`
`78`		`- nonAdaptivePlan.collect {`
	`77`	`+ }`
	`78`	`+ shuffleIdsOption.getOrElse {`
	`79`	`+ plan.collect {`
`79`	`80`	`case exec: ShuffleExchangeLike => exec.shuffleId`
`80`	`81`	`}`
`81`	`82`	`}`