You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 6, 2024. It is now read-only.
-**DAG Parser:** Parses a Datafusion ExecutionPlan into a DAG of stages, where each stage consists of tasks that can be completed without shuffling intermediate results. After decomposing the work, it then enqueues tasks into a work queue in a breadth-first manner.
37
-
-**Work Queue:** A concurrent queue (initially FIFO) where tasks are enqueued by the DAG Parser. Each QUERY submitted by the optimizer also has a cost, allowing for heuristic adjustments to the ordering.
36
+
-**Work Queue:** A concurrent queue where tasks are enqueued by the DAG Parser. Each QUERY submitted by the optimizer also has a cost, allowing for heuristic adjustments to the ordering.
38
37
-**Work Threads (tokio):** Tokio threads are created for each executor node to handle communications.
39
38
-**QueryID Table:** An in-memory data structure mapping QueryIDs to a DAG of remaining QUERY fragments and cost estimates retrieved from the optimizer.
40
39
-**Executors:** Each executor is connected to the scheduler and the other executors via gRPC (tonic).
41
-
-**Intermediate Results**: Intermediate results are stored as a thread-safe HashMap<TaskKey, Vec<RecordBatch> in shared memory. All executors will be able to access intermediate results without having to serialize/deserialize data.
40
+
-**Intermediate Results**: Intermediate results are stored as a thread-safe hashmap in shared memory. All executors will be able to access intermediate results without having to serialize/deserialize data.
1. Receives Datafusion ExecutionPlans from Query Optimizer and parses them into DAG, then stores in QueryID Table.
45
45
2. Leaves of DAG are added to work queue that work threads can pull from.
@@ -62,6 +62,8 @@ Individual components within the scheduler will be unit tested using Rust’s te
62
62
63
63
The end-to-end testing framework is composed of three primary components: the mock frontend, the mock catalog, and the mock executors.
64
64
65
+
66
+

65
67
### 1. Frontend
66
68
The `MockFrontend` class is responsible for:
67
69
- Establishing and maintaining a connection with the scheduler.
@@ -106,12 +108,12 @@ These consist of DataFusion executors and gRPC clients that execute tasks, recei
106
108
### Performance Benchmarking
107
109
To assess the scheduler's capacity to handle complex OLAP queries and maintain high throughput, we intend to use the integration test framework to simultaneously submit all 22 TPC-H queries across a cluster of executors. We will collect the following metrics:
108
110
111
+
-**Speedup from Scaling Out Executors**: Measure the speedup gained from increasing the number of executors.
109
112
-**Execution Time for Each Query**: Measure the duration from submission to completion for each query.
110
113
-**Busy-Idle Time Ratio for Each Executor**: Record periods of activity and inactivity for each executor throughout the test.
111
114
112
115
Additionally, we plan to develop data visualization tools in Python to present the results more effectively.
113
116
114
-

115
117
116
118
## Future Composability with Other Components
117
119
The mock optimizer and executor can be directly replaced with alternative implementations without necessitating any additional changes to the system. While the catalog, cache, and storage layers are not directly integrated into the testing system, we plan to encapsulate most of the logic within the mock catalog to simplify future integration.
0 commit comments