Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 56 additions & 2 deletions src/stage/stage.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,64 @@ use crate::task::ExecutionTask;
use crate::ChannelManager;

/// A unit of isolation for a portion of a physical execution plan
/// that can be executed independently.
/// that can be executed independently and across a network boundary.
/// It implements [`ExecutionPlan`] and can be executed to produce a
/// stream of record batches.
///
/// see https://howqueryengineswork.com/13-distributed-query.html
/// An ExecutionTask is a finer grained unit of work compared to an ExecutionStage.
/// One ExecutionStage will create one or more ExecutionTasks
///
/// When an [`ExecutionStage`] is execute()'d if will execute its plan and return a stream
/// of record batches.
///
/// If the stage has input stages, then it those input stages will be executed on remote resources
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// If the stage has input stages, then it those input stages will be executed on remote resources
/// If the stage has input stages, those input stages will be executed on remote resources

/// and will be provided the remainder of the stage tree.
///
/// For example if our stage tree looks like this:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence does not sound finish

///
/// ```text
/// ┌─────────┐
/// │ stage 1 │
/// └───┬─────┘
/// │
/// ┌──────┴────────┐
/// ┌────┴────┐ ┌────┴────┐
/// │ stage 2 │ │ stage 3 │
/// └────┬────┘ └─────────┘
/// │
/// ┌──────┴────────┐
/// ┌────┴────┐ ┌────┴────┐
/// │ stage 4 │ │ Stage 5 │
/// └─────────┘ └─────────┘
///
/// ```
///
/// Then executing Stage 1 will run its plan locally. Stage 1 has two inputs, Stage 2 and Stage 3. We
/// know these will execute on remote resources. As such the plan for Stage 1 must contain an
/// [`ArrowFlightReadExec`] node that will read the results of Stage 2 and Stage 3 and coalese the
/// results.
///
/// When Stage 1's [`ArrowFlightReadExec`] node is executed, it makes an ArrowFlightRequest to the
/// host assigned in the Stage. It provides the following Stage tree serialilzed in the body of the
/// Arrow Flight Ticket:
///
/// ```text
/// ┌─────────┐
/// │ Stage 2 │
/// └────┬────┘
/// │
/// ┌──────┴────────┐
/// ┌────┴────┐ ┌────┴────┐
/// │ Stage 4 │ │ Stage 5 │
/// └─────────┘ └─────────┘
///
/// ```
///
/// The receiving ArrowFlightEndpoint will then execute Stage 2 and will repeat this process.
///
/// When Stage 4 4 is executed, it has no input tasks, so it is assumed that the plan included in that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// When Stage 4 4 is executed, it has no input tasks, so it is assumed that the plan included in that
/// When Stage 4 is executed, it has no input tasks, so it is assumed that the plan included in that

/// Stage can complete on its own; its likely holding a leaf node in the overall phyysical plan and
/// producing data from a [`DataSourceExec`].
#[derive(Debug, Clone)]
pub struct ExecutionStage {
/// Our stage number
Expand Down