Explain Analyze + Refactor #42

robtandy · 2025-07-10T20:17:17Z

Summary

This PR started with adding Explain Analyze functionality to distributed queries, but has extended to clean up some anti patterns and make some useful refactors.

Refactors

Physical Plan -> Stages -> Tasks

The word Stage was overloaded in the code to mean a portion of the physical plan, but it also meant the individual partitions of that plan when we further divided it for distribution. Now we have the notion of:

Physical Plan - this is the physical plan as generated by DataFusion on the proxy node. Same physical plan as the single query node case.
Stage - The physical plan is chopped up in to Stages during execution planning and is is the portion of the physical execution plan that can be executed in a distributed manner.
Task - Each stage then has a number of associated number of partitions. When we choose a number of these partitions to execute on a single worker, this is a Task.

Note that an important parameter in physical planning is the number of desired partitions. In DataFusion this is, by default, the number of physical cores. In Distributed DataFusion, this is a free parameter. And at the moment its hard coded. I would like to change this to be: First, determined by a SET variable statement in SQL, then Second, able to be intelligently determined by the execution planning step.

Another important parameter in execution planning is partitions per worker and this means if a stage has 10 partitions, and partitions per worker is set to 3, then we will chop this stage up in to 4 Tasks, and they will execution partitions [0,1,2], [3,4,5], [6,7,8], [9]. This should be renamed to partitions-per-task. I would like this to be able to be determined in SQL by a SET variable parameter, and then later determined automatically or by some policy.

Query Execution Path simplified

All queries will either be:

Distributed and executed (normal select queries, etc)
Executed on the proxy node, and have the results be put in a RecordBatchExec and distributed per normal

This greatly simplifies query planning and removes much of the conditional logic for specific types of queries. See the query_planner.rs file for details, and how it is used in proxy.rs

Explain functionality has been refactored to follow the above pattern, instead of sending data back to the client via the Ticket for this explicit query type.

Describe Table has been added to show how to extend this pattern to other query types.

FlightRequestHandler removed

FlightRequestHandler previously had a fair bit of conditional logic to handle different query types. Now that they all work in the same way, this layer of indirection was eliminated and selected code was rolled back in to DFRayProcessorHandler

Trailing data optionally included in `DoGet` streams from `DFRayProcessorHandler`

In order to handle Explain Analyze, we need to send the results of query execution back, per normal, but when we want to send back the annotated plan, we need a mechanism to do so.

GRPC itself has a notion of trailing metadata, but it is not supported well in tonic so It has been added to the DFRayProcessorHandler in make_stream such that if we know we want to send metadata back after exhausting our stream, we can do so. This has been plumbed through so that the metadata is propagated all the way up through the distributed plan. At the moment its used for explain analyze, but if we need to bubble up more data, we should use this mechanism to do so.

Host names are shared

Proxies and Processors now assign themselves a unique friendly hostname, and share it upon discovering one another. This makes plans easy to read and logs easier to parse because we will have lots of hosts floating around and friendly names are easier than IP addresses to read and keep in you head.

Much less Tuples

The code base had way too many functions that returned complicated Tuples types Result<(Vec<String>, String, String)> or something like that. Its not clear what those strings represent, and to a large degree this PR replaces them with sensible structs. We're not fully tuple-free but its much closer and the codebase has improved as a result.

Test coverage is much less

The draw back of this large PR is that many previous tests did not apply and have been eliminated. New ones were not yet added in favor of not making the PR any bigger or later.

New Functionality

Describe

Describe table has been implemented in a way to show the pattern of how we can execute queries local to the proxy but still retain the same client interaction and execution path.

From the python client shell script:

>>> run_sql('describe nation')
┏━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ column_name ┃ data_type ┃ is_nullable ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ n_nationkey │ Int32     │ YES         │
│ n_name      │ Utf8View  │ YES         │
│ n_regionkey │ Int32     │ YES         │
│ n_comment   │ Utf8View  │ YES         │
└─────────────┴───────────┴─────────────┘

Explain

Explain shows the distributed plan, where the stage markers are indicated, but further more shows how the stages are broken in to tasks.

From the python client shell script:

>>> run_sql('explain select count(*), n_regionkey from nation group by n_regionkey')                                                                                    
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ plan_type         ┃ plan                                                                                                                                                 ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ logical_plan      │ Projection: count(Int64(1)) AS count(*), nation.n_regionkey                                                                                          │
│                   │   Aggregate: groupBy=[[nation.n_regionkey]], aggr=[[count(Int64(1))]]                                                                                │
│                   │     TableScan: nation projection=[n_regionkey]                                                                                                       │
│ physical_plan     │ ProjectionExec: expr=[count(Int64(1))@1 as count(*), n_regionkey@0 as n_regionkey]                                                                   │
│                   │   AggregateExec: mode=FinalPartitioned, gby=[n_regionkey@0 as n_regionkey], aggr=[count(Int64(1))]                                                   │
│                   │     CoalesceBatchesExec: target_batch_size=8192                                                                                                      │
│                   │       RepartitionExec: partitioning=Hash([n_regionkey@0], 3), input_partitions=3                                                                     │
│                   │         AggregateExec: mode=Partial, gby=[n_regionkey@0 as n_regionkey], aggr=[count(Int64(1))]                                                      │
│                   │           RepartitionExec: partitioning=RoundRobinBatch(3), input_partitions=1                                                                       │
│                   │             DataSourceExec: file_groups={1 group: [[Users/rob.tandy/tmp/tpch_single/nation.parquet]]}, projection=[n_regionkey], file_type=parquet   │
│                   │                                                                                                                                                      │
│ distributed_plan  │ RayStageExec[2] (output_partitioning=Hash([Column { name: "n_regionkey", index: 1 }], 3))                                                            │
│                   │   ProjectionExec: expr=[count(Int64(1))@1 as count(*), n_regionkey@0 as n_regionkey]                                                                 │
│                   │     AggregateExec: mode=FinalPartitioned, gby=[n_regionkey@0 as n_regionkey], aggr=[count(Int64(1))]                                                 │
│                   │       CoalesceBatchesExec: target_batch_size=8192                                                                                                    │
│                   │         RayStageExec[1] (output_partitioning=Hash([Column { name: "n_regionkey", index: 0 }], 3))                                                    │
│                   │           RepartitionExec: partitioning=Hash([n_regionkey@0], 3), input_partitions=3                                                                 │
│                   │             AggregateExec: mode=Partial, gby=[n_regionkey@0 as n_regionkey], aggr=[count(Int64(1))]                                                  │
│                   │               RayStageExec[0] (output_partitioning=RoundRobinBatch(3))                                                                               │
│                   │                 RepartitionExec: partitioning=RoundRobinBatch(3), input_partitions=1                                                                 │
│                   │                   DataSourceExec: file_groups={1 group: [[Users/rob.tandy/tmp/tpch_single/nation.parquet]]}, projection=[n_regionkey],               │
│                   │ file_type=parquet                                                                                                                                    │
│                   │                                                                                                                                                      │
│ distributed_tasks │ Task: Stage 0, Partitions [0, 1]                                                                                                                     │
│                   │   Full Partitions: false                                                                                                                             │
│                   │   Plan:                                                                                                                                              │
│                   │     MaxRowsExec[max_rows=8192]                                                                                                                       │
│                   │       CoalesceBatchesExec: target_batch_size=8192                                                                                                    │
│                   │         RepartitionExec: partitioning=RoundRobinBatch(3), input_partitions=2                                                                         │
│                   │           PartitionIsolatorExec [providing upto 2 partitions]                                                                                        │
│                   │             DataSourceExec: file_groups={1 group: [[Users/rob.tandy/tmp/tpch_single/nation.parquet]]}, projection=[n_regionkey], file_type=parquet   │
│                   │                                                                                                                                                      │
│                   │ Task: Stage 0, Partitions [2]                                                                                                                        │
│                   │   Full Partitions: false                                                                                                                             │
│                   │   Plan:                                                                                                                                              │
│                   │     MaxRowsExec[max_rows=8192]                                                                                                                       │
│                   │       CoalesceBatchesExec: target_batch_size=8192                                                                                                    │
│                   │         RepartitionExec: partitioning=RoundRobinBatch(3), input_partitions=2                                                                         │
│                   │           PartitionIsolatorExec [providing upto 2 partitions]                                                                                        │
│                   │             DataSourceExec: file_groups={1 group: [[Users/rob.tandy/tmp/tpch_single/nation.parquet]]}, projection=[n_regionkey], file_type=parquet   │
│                   │                                                                                                                                                      │
│                   │ Task: Stage 1, Partitions [0, 1]                                                                                                                     │
│                   │   Full Partitions: false                                                                                                                             │
│                   │   Plan:                                                                                                                                              │
│                   │     MaxRowsExec[max_rows=8192]                                                                                                                       │
│                   │       CoalesceBatchesExec: target_batch_size=8192                                                                                                    │
│                   │         RepartitionExec: partitioning=Hash([n_regionkey@0], 3), input_partitions=2                                                                   │
│                   │           PartitionIsolatorExec [providing upto 2 partitions]                                                                                        │
│                   │             AggregateExec: mode=Partial, gby=[n_regionkey@0 as n_regionkey], aggr=[count(Int64(1))]                                                  │
│                   │               RayStageReaderExec[0] (output_partitioning=UnknownPartitioning(3))                                                                     │
│                   │                                                                                                                                                      │
│                   │ Task: Stage 1, Partitions [2]                                                                                                                        │
│                   │   Full Partitions: false                                                                                                                             │
│                   │   Plan:                                                                                                                                              │
│                   │     MaxRowsExec[max_rows=8192]                                                                                                                       │
│                   │       CoalesceBatchesExec: target_batch_size=8192                                                                                                    │
│                   │         RepartitionExec: partitioning=Hash([n_regionkey@0], 3), input_partitions=2                                                                   │
│                   │           PartitionIsolatorExec [providing upto 2 partitions]                                                                                        │
│                   │             AggregateExec: mode=Partial, gby=[n_regionkey@0 as n_regionkey], aggr=[count(Int64(1))]                                                  │
│                   │               RayStageReaderExec[0] (output_partitioning=UnknownPartitioning(3))                                                                     │
│                   │                                                                                                                                                      │
│                   │ Task: Stage 2, Partitions [0, 1, 2]                                                                                                                  │
│                   │   Full Partitions: false                                                                                                                             │
│                   │   Plan:                                                                                                                                              │
│                   │     MaxRowsExec[max_rows=8192]                                                                                                                       │
│                   │       CoalesceBatchesExec: target_batch_size=8192                                                                                                    │
│                   │         ProjectionExec: expr=[count(Int64(1))@1 as count(*), n_regionkey@0 as n_regionkey]                                                           │
│                   │           AggregateExec: mode=FinalPartitioned, gby=[n_regionkey@0 as n_regionkey], aggr=[count(Int64(1))]                                           │
│                   │             CoalesceBatchesExec: target_batch_size=8192                                                                                              │
│                   │               RayStageReaderExec[1] (output_partitioning=UnknownPartitioning(3))                                                                     │
│                   │                                                                                                                                                      │
└───────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Explain Analyze

Explain shows the results of each Task's Execution plan annotated with metrics produced during execution.
TODO: Also show logical, physical, and stages

From the python client shell script:

>>> run_sql('explain analyze select count(*), n_regionkey from nation group by n_regionkey')                                                                                                                           
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓                                            
┃ Task                                   ┃ Plan with Metrics                                                                                                              ┃                                            
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩                                            
│ Task: Stage 0, Partitions [0, 1]       │ MaxRowsExec[max_rows=8192], metrics=[]                                                                                         │                                            
│ Host: [pastel-ape-0037] 0.0.0.0:20202  │   CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=25, elapsed_compute=18.959µs]                              │                                            
│                                        │     RepartitionExec: partitioning=RoundRobinBatch(3), input_partitions=2, metrics=[fetch_time=1.2745ms, repartition_time=2ns,  │                                            
│                                        │ send_time=7.297µs]                                                                                                             │                                            
│                                        │       PartitionIsolatorExec [providing upto 2 partitions], metrics=[]                                                          │                                            
│                                        │         DataSourceExec: file_groups={1 group: [[Users/rob.tandy/tmp/tpch_single/nation.parquet]]}, projection=[n_regionkey],   │                                            
│                                        │ file_type=parquet, metrics=[output_rows=25, elapsed_compute=1ns, bytes_scanned=81, file_open_errors=0, file_scan_errors=0,     │                                            
│                                        │ num_predicate_creation_errors=0, page_index_rows_matched=0, page_index_rows_pruned=0, predicate_evaluation_errors=0,           │                                            
│                                        │ pushdown_rows_matched=0, pushdown_rows_pruned=0, row_groups_matched_bloom_filter=0, row_groups_matched_statistics=0,           │                                            
│                                        │ row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=0, bloom_filter_eval_time=2ns, metadata_load_time=568.126µs,    │                                            
│                                        │ page_index_eval_time=2ns, row_pushdown_eval_time=2ns, statistics_eval_time=2ns, time_elapsed_opening=633.958µs,                │                                            
│                                        │ time_elapsed_processing=576.167µs, time_elapsed_scanning_total=361.125µs, time_elapsed_scanning_until_data=332µs]              │                                            
│                                        │                                                                                                                                │                                            
│ Task: Stage 0, Partitions [2]          │ MaxRowsExec[max_rows=8192], metrics=[]                                                                                         │                                            
│ Host: [jaunty-gaur-0201] 0.0.0.0:20201 │   CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=0, elapsed_compute=209ns]                                  │                                            
│                                        │     RepartitionExec: partitioning=RoundRobinBatch(3), input_partitions=2, metrics=[fetch_time=72.584µs, repartition_time=2ns,  │                                            
│                                        │ send_time=6ns]                                                                                                                 │                                            
│                                        │       PartitionIsolatorExec [providing upto 2 partitions], metrics=[]                                                          │                                            
│                                        │         DataSourceExec: file_groups={1 group: [[Users/rob.tandy/tmp/tpch_single/nation.parquet]]}, projection=[n_regionkey],   │                                            
│                                        │ file_type=parquet, metrics=[]                                                                                                  │                                            
│                                        │                                                                                                                                │                                            
│ Task: Stage 1, Partitions [0, 1]       │ MaxRowsExec[max_rows=8192], metrics=[]                                                                                         │                                            
│ Host: [pastel-ape-0037] 0.0.0.0:20202  │   CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=5, elapsed_compute=28.417µs]                               │                                            
│                                        │     RepartitionExec: partitioning=Hash([n_regionkey@0], 3), input_partitions=2, metrics=[fetch_time=9.315291ms,                │                                            
│                                        │ repartition_time=32.084µs, send_time=11.502µs]                                                                                 │                                            
│                                        │       PartitionIsolatorExec [providing upto 2 partitions], metrics=[]                                                          │                                            
│                                        │         AggregateExec: mode=Partial, gby=[n_regionkey@0 as n_regionkey], aggr=[count(Int64(1))], metrics=[output_rows=5,       │                                            
│                                        │ elapsed_compute=187.499µs, spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, skipped_aggregation_rows=0, peak_mem_used=4520] │                                            
│                                        │           RayStageReaderExec[0] (output_partitioning=UnknownPartitioning(3)), metrics=[]                                       │                                            
│                                        │                                                                                                                                │                                            
│ Task: Stage 1, Partitions [2]          │ MaxRowsExec[max_rows=8192], metrics=[]                                                                                         │                                            
│ Host: [jaunty-gaur-0201] 0.0.0.0:20201 │   CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=0, elapsed_compute=292ns]                                  │                                            
│                                        │     RepartitionExec: partitioning=Hash([n_regionkey@0], 3), input_partitions=2, metrics=[fetch_time=4.307542ms,                │                                            
│                                        │ repartition_time=2ns, send_time=6ns]                                                                                           │                                            
│                                        │       PartitionIsolatorExec [providing upto 2 partitions], metrics=[]                                                          │                                            
│                                        │         AggregateExec: mode=Partial, gby=[n_regionkey@0 as n_regionkey], aggr=[count(Int64(1))], metrics=[output_rows=0,       │                                            
│                                        │ elapsed_compute=53.084µs, spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, skipped_aggregation_rows=0, peak_mem_used=64]    │                                            
│                                        │           RayStageReaderExec[0] (output_partitioning=UnknownPartitioning(3)), metrics=[]                                       │                                            
│                                        │                                                                                                                                │
│ Task: Stage 2, Partitions [0]          │ MaxRowsExec[max_rows=8192], metrics=[]                                                                                         │
│ Host: [pastel-ape-0037] 0.0.0.0:20202  │   CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=5, elapsed_compute=13.54µs]                                │
│                                        │     ProjectionExec: expr=[count(Int64(1))@1 as count(*), n_regionkey@0 as n_regionkey], metrics=[output_rows=5,                │
│                                        │ elapsed_compute=6.168µs]                                                                                                       │
│                                        │       AggregateExec: mode=FinalPartitioned, gby=[n_regionkey@0 as n_regionkey], aggr=[count(Int64(1))],                        │
│                                        │ metrics=[output_rows=5, elapsed_compute=312.46µs, spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, peak_mem_used=12672]     │
│                                        │         CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=5, elapsed_compute=32.377µs]                         │
│                                        │           RayStageReaderExec[1] (output_partitioning=UnknownPartitioning(3)), metrics=[]                                       │
│                                        │                                                                                                                                │
└────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘```

NGA-TRAN

This is very nice, Rob.

Thanks for the long description. It makes it easier to follow and review.

I like you make it clearer that a stage can be sent to different worker through different tasks. I told myself we need something like this to display/annotate duration for explain analyze. I bet in the future, Gabriel will be able to make the display even easier to read. He has been doing this in retriever.
Thanks for refactoring the explain. It makes a lot more sense now.
I have not looked into the details of explain analyze because I am not familiar of the context. But I have no concern as you structure the code very well. I will get more familiar when I use it, add more tests for it and other features.
I think the PR will break the tpc-h validation tests because you have removed the view creation for q15. If it takes so much time to make the tests pass, it is fine to merge this and fix the test later.
I also think all of us should work on adding more unit tests and integration tests

NGA-TRAN · 2025-07-10T20:46:18Z

scripts/launch_python_arrowflightsql_client.sh

+            rich_table.add_row(*row_data)
+
+        console = Console()
+        console.print(rich_table, markup=False)


👍 Showing table format is a lot better.

NGA-TRAN · 2025-07-10T20:49:47Z

scripts/launch_tpch_cluster.sh


-# Define views required for TPC-H queries (e.g., q15)
-export DFRAY_VIEWS="create view revenue0 (supplier_no, total_revenue) as select l_suppkey, sum(l_extendedprice * (1 - l_discount)) from lineitem where l_shipdate >= date '1996-08-01' and l_shipdate < date '1996-08-01' + interval '3' month group by l_suppkey"
-


I could not find where this view is created. It is needed to run tpc-h q15. Do you plan to create the view when we run the query?

I have disabled the tpc-h test because it is slow now. Can you try to run this command to see if all the queries pass? I suspect q15 will fail

cargo test --test tpch_validation test_tpch_validation_all_queries -- --ignored --nocapture

I don't know yet where we will keep catalog metadata that is typically held in the SessionState in a distributed world. I will think more about this.

NGA-TRAN · 2025-07-10T20:55:25Z

src/explain.rs

-        if result.is_empty() {
-            result.push_str("No distributed stages generated");
+        if i < tasks.len() - 1 {
+            result.push('\n');


This is very minor mostly to make it easier for us to read the plan if we add one more layer in the display:

Stage 0 Task 1 ... Task 2 ... Stage 1 Task 3 ... Task 4 ...

Oh, i agree that is better.

NGA-TRAN · 2025-07-10T20:59:07Z

src/lib.rs

-pub mod flight_handlers;
 pub mod friendly;
 pub mod isolator;
-pub mod k8s;


👍
I have been thinking to remove this, too. We can always add it back when it is done in a more generic way

NGA-TRAN · 2025-07-10T21:31:10Z

src/query_planner.rs

+            // add other logical plans for local execution here following the pattern for explain
+            p @ LogicalPlan::DescribeTable(_) => self.prepare_local(p, ctx).await,
+            p => self.prepare_query(p, ctx).await,
+        }


👍 This matching logic is a lot better. I should have used this 🙂

LiaCastaneda

This is a very nice refactor, thanks! 🙇‍♀️ I’ve read through the code, but I’ll probably need to revisit DfRayProcessorHandler::make_stream later, as I didn’t fully grasp how it works

LiaCastaneda · 2025-07-11T10:32:32Z

src/planning.rs

-    target_stage_ids: &[u64],
-    stages: &[StageData],
-) -> Result<StageAddrs> {
+fn get_stage_addrs_from_tasks(target_stage_ids: &[u64], stages: &[DDTask]) -> Result<StageAddrs> {


Suggested change

fn get_stage_addrs_from_tasks(target_stage_ids: &[u64], stages: &[DDTask]) -> Result<StageAddrs> {

fn get_stage_addrs_from_tasks(target_stage_ids: &[u64], tasks: &[DDTask]) -> Result<StageAddrs> {

Aren't these all the tasks from all the stages?

LiaCastaneda · 2025-07-11T12:05:59Z

src/analyze.rs

+    }
+}
+
+impl DisplayAs for DistributedAnalyzeRootExec {


It would be nice to document both nodes DistributedAnalyzeRootExec and DistributedAnalyzeExec so we can know the difference, without having to read the planning code but I agree all doc can be included in a follow up PR

robtandy added 7 commits July 2, 2025 14:45

WIP for explain analyze

4bae189

refactor to change all (String,String) to Host, explain analyze

7dc0777

remove conditional logic for explain from planning and execution paths

116964d

explain refactor and explain analyze and cleanup

18bff98

add cargo lock

7a2091b

merge origin/main

2a7be3a

clean up python tpch client a bit

e0db0e3

NGA-TRAN approved these changes Jul 10, 2025

View reviewed changes

LiaCastaneda approved these changes Jul 11, 2025

View reviewed changes

robtandy merged commit 00ac55a into main Jul 11, 2025
3 checks passed

gabotechs deleted the robtandy/explainanalyze branch August 4, 2025 14:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Explain Analyze + Refactor #42

Explain Analyze + Refactor #42

Uh oh!

robtandy commented Jul 10, 2025

Uh oh!

NGA-TRAN left a comment

Uh oh!

NGA-TRAN Jul 10, 2025

Uh oh!

NGA-TRAN Jul 10, 2025

Uh oh!

robtandy Jul 11, 2025

Uh oh!

NGA-TRAN Jul 10, 2025

Uh oh!

robtandy Jul 11, 2025

Uh oh!

NGA-TRAN Jul 10, 2025

Uh oh!

NGA-TRAN Jul 10, 2025

Uh oh!

LiaCastaneda left a comment

Uh oh!

LiaCastaneda Jul 11, 2025

Uh oh!

LiaCastaneda Jul 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		# Define views required for TPC-H queries (e.g., q15)
		export DFRAY_VIEWS="create view revenue0 (supplier_no, total_revenue) as select l_suppkey, sum(l_extendedprice * (1 - l_discount)) from lineitem where l_shipdate >= date '1996-08-01' and l_shipdate < date '1996-08-01' + interval '3' month group by l_suppkey"

	fn get_stage_addrs_from_tasks(target_stage_ids: &[u64], stages: &[DDTask]) -> Result<StageAddrs> {
	fn get_stage_addrs_from_tasks(target_stage_ids: &[u64], tasks: &[DDTask]) -> Result<StageAddrs> {

Explain Analyze + Refactor #42

Explain Analyze + Refactor #42

Uh oh!

Conversation

robtandy commented Jul 10, 2025

Summary

Refactors

Physical Plan -> Stages -> Tasks

Query Execution Path simplified

FlightRequestHandler removed

Trailing data optionally included in DoGet streams from DFRayProcessorHandler

Host names are shared

Much less Tuples

Test coverage is much less

New Functionality

Describe

Explain

Explain Analyze

Uh oh!

NGA-TRAN left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LiaCastaneda left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Trailing data optionally included in `DoGet` streams from `DFRayProcessorHandler`