Skip to content

Conversation

@gabotechs
Copy link
Collaborator

@gabotechs gabotechs commented Sep 4, 2025

This PR changes the way EXPLAIN displays the indented verbose distributed plan:

  • Displays the partitions each task is responsible for in a more condensed way at the beginning of the stage
  • Removes the partitions mapping in each node for de-cluttering the UI, as this info is now displayed in the relevant nodes (RepartitionExec, ArrowFlightReadExec, etc...)
  • Adds the input_partitions and input_tasks field to the ArrowFlightReadExec node
  • Adds the partition group mapping to the PartitionIsolatorExec node

Before

┌───── Stage 3   Task: partitions: 0,unassigned]
│partitions [out:1  <-- in:1  ] ProjectionExec: expr=[count(*)@0 as count(*), RainToday@1 as RainToday]
│partitions [out:1  <-- in:4  ]   SortPreservingMergeExec: [count(Int64(1))@2 ASC NULLS LAST]
│partitions [out:4  <-- in:4  ]     SortExec: expr=[count(*)@0 ASC NULLS LAST], preserve_partitioning=[true]
│partitions [out:4  <-- in:4  ]       ProjectionExec: expr=[count(Int64(1))@1 as count(*), RainToday@0 as RainToday, count(Int64(1))@1 as count(Int64(1))]
│partitions [out:4  <-- in:4  ]         AggregateExec: mode=FinalPartitioned, gby=[RainToday@0 as RainToday], aggr=[count(Int64(1))]
│partitions [out:4  <-- in:4  ]           CoalesceBatchesExec: target_batch_size=8192
│partitions [out:4            ]             ArrowFlightReadExec: Stage 2  
└──────────────────────────────────────────────────
  ┌───── Stage 2   Task: partitions: 0,1,unassigned],Task: partitions: 2,3,unassigned]
  │partitions [out:4  <-- in:2  ] RepartitionExec: partitioning=Hash([RainToday@0], 4), input_partitions=2
  │partitions [out:2  <-- in:4  ]   PartitionIsolatorExec [providing upto 2 partitions]
  │partitions [out:4            ]     ArrowFlightReadExec: Stage 1  
  └──────────────────────────────────────────────────
    ┌───── Stage 1   Task: partitions: 0,1,unassigned],Task: partitions: 2,3,unassigned]
    │partitions [out:4  <-- in:2  ] RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=2
    │partitions [out:2  <-- in:1  ]   PartitionIsolatorExec [providing upto 2 partitions]
    │partitions [out:1  <-- in:1  ]     AggregateExec: mode=Partial, gby=[RainToday@0 as RainToday], aggr=[count(Int64(1))]
    │partitions [out:1            ]       DataSourceExec: file_groups={1 group: [[/testdata/weather.parquet]]}, projection=[RainToday], file_type=parquet
    └──────────────────────────────────────────────────

After

┌───── Stage 3   Tasks: t0:[p0] 
│ ProjectionExec: expr=[count(*)@0 as count(*), RainToday@1 as RainToday]
│   SortPreservingMergeExec: [count(Int64(1))@2 ASC NULLS LAST]
│     SortExec: expr=[count(*)@0 ASC NULLS LAST], preserve_partitioning=[true]
│       ProjectionExec: expr=[count(Int64(1))@1 as count(*), RainToday@0 as RainToday, count(Int64(1))@1 as count(Int64(1))]
│         AggregateExec: mode=FinalPartitioned, gby=[RainToday@0 as RainToday], aggr=[count(Int64(1))]
│           CoalesceBatchesExec: target_batch_size=8192
│             ArrowFlightReadExec input_stage=2, input_partitions=4, input_tasks=2
└──────────────────────────────────────────────────
  ┌───── Stage 2   Tasks: t0:[p0,p1] t1:[p2,p3] 
  │ RepartitionExec: partitioning=Hash([RainToday@0], 4), input_partitions=2
  │   PartitionIsolatorExec Tasks: t0:[p0,p1,__,__] t1:[__,__,p0,p1] 
  │     ArrowFlightReadExec input_stage=1, input_partitions=4, input_tasks=2
  └──────────────────────────────────────────────────
    ┌───── Stage 1   Tasks: t0:[p0,p1] t1:[p2,p3] 
    │ RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=2
    │   PartitionIsolatorExec Tasks: t0:[p0,p1,__,__] t1:[__,__,p0,p1] 
    │     AggregateExec: mode=Partial, gby=[RainToday@0 as RainToday], aggr=[count(Int64(1))]
    │       DataSourceExec: file_groups={1 group: [[/testdata/weather.parquet]]}, projection=[RainToday], file_type=parquet
    └──────────────────────────────────────────────────

@gabotechs gabotechs force-pushed the gabrielmusat/improve-explain branch from 80caee4 to 1c68bc0 Compare September 4, 2025 08:54
@gabotechs gabotechs changed the title Improve explain render Improve EXPLAIN render Sep 4, 2025
Copy link
Collaborator

@NGA-TRAN NGA-TRAN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ this. I have compared all the details of before and after and every single change makes sense and easier to read with enough information.

partitions [out:4 ] ArrowFlightReadExec: Stage 1
┌───── Stage 2 Tasks: t0:[p0,p1] t1:[p2,p3]
│ RepartitionExec: partitioning=Hash([RainToday@0], 4), input_partitions=2
PartitionIsolatorExec Tasks: t0:[p0,p1,__,__] t1:[__,__,p0,p1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this

@gabotechs gabotechs merged commit bffe4b4 into main Sep 5, 2025
3 checks passed
@gabotechs gabotechs deleted the gabrielmusat/improve-explain branch September 5, 2025 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants