Skip to content

Commit d7026d3

Browse files
committed
Improve README.md
1 parent c1b6d3c commit d7026d3

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ capabilities with minimal changes.
2626

2727
Before diving into the architecture, it's important to clarify some terms and what they mean:
2828

29-
- `worker`: a physical machine listening to serialized plans over an Arrow Flight interface.
29+
- `worker`: a physical machine listening to serialized execution plans over an Arrow Flight interface.
3030
- `network boundary`: a node in the plan that streams data from a network interface rather than directly from its
3131
children. Implemented as an `ArrowFlightReadExec` physical DataFusion node.
3232
- `stage`: a portion of the plan separated by a network boundary from other parts of the plan. Implemented as any
@@ -37,8 +37,8 @@ Before diving into the architecture, it's important to clarify some terms and wh
3737
A distributed DataFusion query is executed in a very similar fashion as a normal DataFusion query with one key
3838
difference:
3939

40-
The physical plan is divided into stages that can be executed on different workers and exchange data using Arrow
41-
Flight. All of this is done at the physical plan level, and is implemented as a `PhysicalOptimizerRule` that:
40+
The physical plan is divided into stages, each stage is assigned tasks that run in parallel in different workers. All of
41+
this is done at the physical plan level, and is implemented as a `PhysicalOptimizerRule` that:
4242

4343
1. Inspects the non-distributed plan, placing network boundaries (`ArrowFlightReadExec` nodes) in the appropriate places
4444
2. Based on the placed network boundaries, divides the plan into stages and assigns tasks to them.
@@ -158,8 +158,8 @@ will issue 3 concurrent Arrow Flight requests to the appropriate physical nodes.
158158
This means that:
159159

160160
1. The head stage is executed normally as if the query was not distributed.
161-
2. Upon calling `.execute()` on the `ArrowFlightReadExec`, instead of recursively calling `.execute()` on its children,
162-
the child subplan will be serialized and sent over the wire to another node.
161+
2. Upon calling `.execute()` on `ArrowFlightReadExec`, instead of propagating the `.execute()` call on its child,
162+
the subplan is serialized and sent over the wire to be executed on another worker.
163163
3. The next node, which is hosting an Arrow Flight Endpoint listening for gRPC requests over an HTTP server, will pick
164164
up the request containing the serialized chunk of the overall plan and execute it.
165165
4. This is repeated for each stage, and data will start flowing from bottom to top until it reaches the head stage.

0 commit comments

Comments
 (0)