Skip to content

Comments

[SPARK-55052][SQL] Add AQEShuffleRead properties to Physical Plan Tree#53817

Open
erenavsarogullari wants to merge 1 commit intoapache:masterfrom
erenavsarogullari:SPARK-55052
Open

[SPARK-55052][SQL] Add AQEShuffleRead properties to Physical Plan Tree#53817
erenavsarogullari wants to merge 1 commit intoapache:masterfrom
erenavsarogullari:SPARK-55052

Conversation

@erenavsarogullari
Copy link
Member

What changes were proposed in this pull request?

AQEShuffleRead can have local / coalesced / skewed / coalesced and skewed properties when reading shuffle files. When Physical Plan Tree is complex, it is hard to track this info by correlating with AQEShuffleRead details such as which AQEShuffleRead has local read or skewed partition info etc. For example, following skewed SortMergeJoin case, this helps to understand which SMJ leg has AQEShuffleRead with skew. This addition aims to access this kind of use-cases at physical plan tree level. Plan Tree details section per AQEShuffleRead node also shows these properties but when query plan tree is too complex (e.g: composed by 1000+ physical nodes), it is hard to correlate this information with AQEShuffleRead details.

Current Physical Plan Tree:

== Physical Plan ==
AdaptiveSparkPlan (24)
+- == Final Plan ==
   ResultQueryStage (17), Statistics(sizeInBytes=8.0 EiB)
   +- * Project (16)
      +- * SortMergeJoin(skew=true) Inner (15)
         :- * Sort (7)
         :  +- AQEShuffleRead (6)
         :     +- ShuffleQueryStage (5), Statistics(sizeInBytes=15.6 KiB, rowCount=1.00E+3)
         :        +- Exchange (4)
         :           +- * Project (3)
         :              +- * Filter (2)
         :                 +- * Range (1)
         +- * Sort (14)
            +- AQEShuffleRead (13)
               +- ShuffleQueryStage (12), Statistics(sizeInBytes=3.1 KiB, rowCount=200)
                  +- Exchange (11)
                     +- * Project (10)
                        +- * Filter (9)
                           +- * Range (8)

New Physical Plan Tree:

== Physical Plan ==
AdaptiveSparkPlan (24)
+- == Final Plan ==
   ResultQueryStage (17), Statistics(sizeInBytes=8.0 EiB)
   +- * Project (16)
      +- * SortMergeJoin(skew=true) Inner (15)
         :- * Sort (7)
         :  +- AQEShuffleRead (6), coalesced
         :     +- ShuffleQueryStage (5), Statistics(sizeInBytes=15.6 KiB, rowCount=1.00E+3)
         :        +- Exchange (4)
         :           +- * Project (3)
         :              +- * Filter (2)
         :                 +- * Range (1)
         +- * Sort (14)
            +- AQEShuffleRead (13), coalesced and skewed
               +- ShuffleQueryStage (12), Statistics(sizeInBytes=3.1 KiB, rowCount=200)
                  +- Exchange (11)
                     +- * Project (10)
                        +- * Filter (9)
                           +- * Range (8)

Why are the changes needed?

When physical plan tree is complex (e.g: composed by 1000+ physical nodes), it is hard to correlate this information with AQEShuffleRead details.

Does this PR introduce any user-facing change?

Yes, when the user investigates the physical plan, new AQEShuffleRead properties will be seen at Physical Plan Tree.

How was this patch tested?

Added a new UT

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Jan 15, 2026
@github-actions
Copy link

JIRA Issue Information

=== Task SPARK-55052 ===
Summary: Add AQEShuffleRead properties to Physical Plan Tree
Assignee: None
Status: Open
Affected: ["4.2.0"]


This comment was automatically generated by GitHub Actions

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds AQEShuffleRead properties (local, coalesced, skewed, coalesced and skewed) to the Physical Plan Tree output, making it easier to identify shuffle read characteristics in complex query plans without correlating details from separate sections.

Changes:

  • Modified AQEShuffleReadExec.simpleStringWithNodeId() to append shuffle read properties to the node string
  • Added a unit test to verify the new properties appear correctly in the formatted explain output

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEShuffleReadExec.scala Overrides simpleStringWithNodeId() to append shuffle read properties (from stringArgs) to the node description
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala Adds test case to verify that AQEShuffleRead properties are correctly displayed in the physical plan tree with coalesced and skewed partitions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@erenavsarogullari erenavsarogullari force-pushed the SPARK-55052 branch 2 times, most recently from 499788a to 2ceab71 Compare February 10, 2026 00:50
@erenavsarogullari
Copy link
Member Author

Hi @cloud-fan and @yaooqinn,
This PR is ready for the review. Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants