Low number of stages in TPCH tests

The following PR:
- https://github.com/datafusion-contrib/datafusion-distributed/pull/107

Extended the TPCH tests asserting the generated plans, and one thing that stands out is that the queries are getting only slightly distributed, having at most 1 or 2 network boundaries per query.

For example, TPCH query 21 has only 2 stages and 1 network boundary, even though the query is huge:

```
┌───── Stage 2   Task: partitions: 0,unassigned]
│partitions [out:1  <-- in:3  ] SortPreservingMergeExec: [numwait@1 DESC, s_name@0 ASC NULLS LAST]
│partitions [out:3  <-- in:3  ]   SortExec: expr=[numwait@1 DESC, s_name@0 ASC NULLS LAST], preserve_partitioning=[true]
│partitions [out:3  <-- in:3  ]     ProjectionExec: expr=[s_name@0 as s_name, count(Int64(1))@1 as numwait]
│partitions [out:3  <-- in:3  ]       AggregateExec: mode=FinalPartitioned, gby=[s_name@0 as s_name], aggr=[count(Int64(1))]
│partitions [out:3  <-- in:3  ]         CoalesceBatchesExec: target_batch_size=8192
│partitions [out:3            ]           ArrowFlightReadExec: Stage 1  
└──────────────────────────────────────────────────
  ┌───── Stage 1   Task: partitions: 0..2,unassigned]
  │partitions [out:3  <-- in:3  ] RepartitionExec: partitioning=Hash([s_name@0], 3), input_partitions=3
  │partitions [out:3  <-- in:3  ]   AggregateExec: mode=Partial, gby=[s_name@0 as s_name], aggr=[count(Int64(1))]
  │partitions [out:3  <-- in:3  ]     CoalesceBatchesExec: target_batch_size=8192
  │partitions [out:3  <-- in:1  ]       HashJoinExec: mode=CollectLeft, join_type=LeftAnti, on=[(l_orderkey@1, l_orderkey@0)], filter=l_suppkey@1 != l_suppkey@0, projection=[s_name@0]
  │partitions [out:1  <-- in:3  ]         CoalescePartitionsExec
  │partitions [out:3  <-- in:3  ]           CoalesceBatchesExec: target_batch_size=8192
  │partitions [out:3  <-- in:1  ]             HashJoinExec: mode=CollectLeft, join_type=LeftSemi, on=[(l_orderkey@1, l_orderkey@0)], filter=l_suppkey@1 != l_suppkey@0
  │partitions [out:1  <-- in:3  ]               CoalescePartitionsExec
  │partitions [out:3  <-- in:3  ]                 CoalesceBatchesExec: target_batch_size=8192
  │partitions [out:3  <-- in:1  ]                   HashJoinExec: mode=CollectLeft, join_type=Inner, on=[(n_nationkey@0, s_nationkey@1)], projection=[s_name@1, l_orderkey@3, l_suppkey@4]
  │partitions [out:1  <-- in:3  ]                     CoalescePartitionsExec
  │partitions [out:3  <-- in:3  ]                       CoalesceBatchesExec: target_batch_size=8192
  │partitions [out:3  <-- in:3  ]                         FilterExec: n_name@1 = SAUDI ARABIA, projection=[n_nationkey@0]
  │partitions [out:3            ]                           DataSourceExec: file_groups={3 groups: [[/testdata/tpch/data/nation/1.parquet], [/testdata/tpch/data/nation/2.parquet], [/testdata/tpch/data/nation/3.parquet]]}, projection=[n_nationkey, n_name], file_type=parquet, predicate=n_name@1 = SAUDI ARABIA, pruning_predicate=n_name_null_count@2 != row_count@3 AND n_name_min@0 <= SAUDI ARABIA AND SAUDI ARABIA <= n_name_max@1, required_guarantees=[n_name in (SAUDI ARABIA)]
  │partitions [out:3  <-- in:3  ]                     CoalesceBatchesExec: target_batch_size=8192
  │partitions [out:3  <-- in:1  ]                       HashJoinExec: mode=CollectLeft, join_type=Inner, on=[(o_orderkey@0, l_orderkey@2)], projection=[s_name@1, s_nationkey@2, l_orderkey@3, l_suppkey@4]
  │partitions [out:1  <-- in:3  ]                         CoalescePartitionsExec
  │partitions [out:3  <-- in:3  ]                           CoalesceBatchesExec: target_batch_size=8192
  │partitions [out:3  <-- in:3  ]                             FilterExec: o_orderstatus@1 = F, projection=[o_orderkey@0]
  │partitions [out:3            ]                               DataSourceExec: file_groups={3 groups: [[/testdata/tpch/data/orders/1.parquet], [/testdata/tpch/data/orders/2.parquet], [/testdata/tpch/data/orders/3.parquet]]}, projection=[o_orderkey, o_orderstatus], file_type=parquet, predicate=o_orderstatus@1 = F, pruning_predicate=o_orderstatus_null_count@2 != row_count@3 AND o_orderstatus_min@0 <= F AND F <= o_orderstatus_max@1, required_guarantees=[o_orderstatus in (F)]
  │partitions [out:3  <-- in:3  ]                         CoalesceBatchesExec: target_batch_size=8192
  │partitions [out:3  <-- in:1  ]                           HashJoinExec: mode=CollectLeft, join_type=Inner, on=[(s_suppkey@0, l_suppkey@1)], projection=[s_name@1, s_nationkey@2, l_orderkey@3, l_suppkey@4]
  │partitions [out:1  <-- in:3  ]                             CoalescePartitionsExec
  │partitions [out:3            ]                               DataSourceExec: file_groups={3 groups: [[/testdata/tpch/data/supplier/1.parquet], [/testdata/tpch/data/supplier/2.parquet], [/testdata/tpch/data/supplier/3.parquet]]}, projection=[s_suppkey, s_name, s_nationkey], file_type=parquet
  │partitions [out:3  <-- in:3  ]                             CoalesceBatchesExec: target_batch_size=8192
  │partitions [out:3  <-- in:3  ]                               FilterExec: l_receiptdate@3 > l_commitdate@2, projection=[l_orderkey@0, l_suppkey@1]
  │partitions [out:3            ]                                 DataSourceExec: file_groups={3 groups: [[/testdata/tpch/data/lineitem/1.parquet], [/testdata/tpch/data/lineitem/2.parquet], [/testdata/tpch/data/lineitem/3.parquet]]}, projection=[l_orderkey, l_suppkey, l_commitdate, l_receiptdate], file_type=parquet, predicate=l_receiptdate@3 > l_commitdate@2
  │partitions [out:3            ]               DataSourceExec: file_groups={3 groups: [[/testdata/tpch/data/lineitem/1.parquet], [/testdata/tpch/data/lineitem/2.parquet], [/testdata/tpch/data/lineitem/3.parquet]]}, projection=[l_orderkey, l_suppkey], file_type=parquet
  │partitions [out:3  <-- in:3  ]         CoalesceBatchesExec: target_batch_size=8192
  │partitions [out:3  <-- in:3  ]           FilterExec: l_receiptdate@3 > l_commitdate@2, projection=[l_orderkey@0, l_suppkey@1]
  │partitions [out:3            ]             DataSourceExec: file_groups={3 groups: [[/testdata/tpch/data/lineitem/1.parquet], [/testdata/tpch/data/lineitem/2.parquet], [/testdata/tpch/data/lineitem/3.parquet]]}, projection=[l_orderkey, l_suppkey, l_commitdate, l_receiptdate], file_type=parquet, predicate=l_receiptdate@3 > l_commitdate@2
  └──────────────────────────────────────────────────
```

Maybe there's some opportunity in the planner to improve the distribution logic?

Probably getting some inspiration from other distributed engines like Ballista or Trino can help inform distributed planning decicions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Low number of stages in TPCH tests #112

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Low number of stages in TPCH tests #112

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions