@@ -24,6 +24,16 @@ capabilities with minimal changes.
2424
2525## Architecture
2626
27+ Before diving into the architecture, it's important to clarify some terms and what they mean:
28+
29+ - ` worker ` : a physical machine listening to serialized plans over an Arrow Flight interface.
30+ - ` network boundary ` : a node in the plan that streams data from a network interface rather than directly from its
31+ children. Implemented as an ` ArrowFlightReadExec ` physical DataFusion node.
32+ - ` stage ` : a portion of the plan separated by a network boundary from other parts of the plan. Implemented as any
33+ other physical node in DataFusion.
34+ - ` task ` : a unit of work inside a stage that executes a subset of its partitions in a specific worker.
35+ - ` subplan ` : a slice of the overall plan
36+
2737A distributed DataFusion query is executed in a very similar fashion as a normal DataFusion query with one key
2838difference:
2939
@@ -111,7 +121,7 @@ shows the partitioning scheme in each node):
111121 └──────[0][1][2]───────┘
112122```
113123
114- Based on that boundary, the plan is divided into stages and tasks are assigned to each stage. Each task will be
124+ Based on that boundary, the plan is divided into stages, and tasks are assigned to each stage. Each task will be
115125responsible for the different partitions in the original plan.
116126
117127```
@@ -131,11 +141,11 @@ responsible for the different partitions in the original plan.
131141 │ │ │
132142 ┌─── task 0 (stage 1) ───┐ ┌── task 1 (stage 1) ────┐ ┌── task 2 (stage 1) ────┐
133143 │ │ │ │ │ │ │ │ │
134- │┌─────────[0]──────────┐│ │┌─────────[0 ]──────────┐│ │┌──────────[0 ]─────────┐│
144+ │┌─────────[0]──────────┐│ │┌─────────[1 ]──────────┐│ │┌──────────[2 ]─────────┐│
135145 ││ AggregateExec ││ ││ AggregateExec ││ ││ AggregateExec ││
136146 ││ (partial) ││ ││ (partial) ││ ││ (partial) ││
137147 │└──────────┬───────────┘│ │└──────────┬───────────┘│ │└───────────┬──────────┘│
138- │┌─────────[0]──────────┐│ │┌─────────[0 ]──────────┐│ │┌──────────[0 ]─────────┐│
148+ │┌─────────[0]──────────┐│ │┌─────────[1 ]──────────┐│ │┌──────────[2 ]─────────┐│
139149 ││ DataSourceExec ││ ││ DataSourceExec ││ ││ DataSourceExec ││
140150 │└──────────────────────┘│ │└──────────────────────┘│ │└──────────────────────┘│
141151 └────────────────────────┘ └────────────────────────┘ └────────────────────────┘
@@ -149,7 +159,7 @@ This means that:
149159
1501601 . The head stage is executed normally as if the query was not distributed.
1511612 . Upon calling ` .execute() ` on the ` ArrowFlightReadExec ` , instead of recursively calling ` .execute() ` on its children,
152- they will be serialized and sent over the wire to another node.
162+ the child subplan will be serialized and sent over the wire to another node.
1531633 . The next node, which is hosting an Arrow Flight Endpoint listening for gRPC requests over an HTTP server, will pick
154164 up the request containing the serialized chunk of the overall plan and execute it.
1551654 . This is repeated for each stage, and data will start flowing from bottom to top until it reaches the head stage.
@@ -167,7 +177,7 @@ There are some runnable examples showcasing how to provide a localhost implement
167177The integration tests also provide an idea about how to use the library and what can be achieved with it:
168178
169179- [ tpch_validation_test.rs] ( tests/tpch_validation_test.rs ) : executes all TPCH queries and performs assertions over the
170- distributed plans.
180+ distributed plans and the results vs running the queries in single node mode with a small scale factor .
171181- [ custom_config_extension.rs] ( tests/custom_config_extension.rs ) : showcases how to propagate custom DataFusion config
172182 extensions.
173183- [ custom_extension_codec.rs] ( tests/custom_extension_codec.rs ) : showcases how to propagate custom physical extension
0 commit comments