-
Notifications
You must be signed in to change notification settings - Fork 14
Refactor distributed planner into its own folder #196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| use datafusion::common::DataFusionError; | ||
| use std::error::Error; | ||
| use std::fmt::{Display, Formatter}; | ||
|
|
||
| /// Error thrown during distributed planning that prompts the planner to change something and | ||
| /// try again. | ||
| #[derive(Debug)] | ||
| pub enum DistributedPlanError { | ||
| /// Prompts the planner to limit the amount of tasks used in the stage that is currently | ||
| /// being planned. | ||
| LimitTasks(usize), | ||
| /// Signals the planner that this whole plan is non-distributable. This can happen if | ||
| /// certain nodes are present, like `StreamingTableExec`, which are typically used in | ||
| /// queries that rather performing some execution, they perform some introspection. | ||
| NonDistributable(&'static str), | ||
| } | ||
|
|
||
| impl Display for DistributedPlanError { | ||
| fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { | ||
| match self { | ||
| DistributedPlanError::LimitTasks(n) => write!(f, "LimitTasksErr: {n}"), | ||
| DistributedPlanError::NonDistributable(name) => write!(f, "NonDistributable: {name}"), | ||
| } | ||
| } | ||
| } | ||
|
|
||
| impl Error for DistributedPlanError {} | ||
|
|
||
| /// Builds a [DistributedPlanError::LimitTasks] error. This error prompts the distributed planner | ||
| /// to try rebuilding the current stage with a limited amount of tasks. | ||
| pub fn limit_tasks_err(limit: usize) -> DataFusionError { | ||
| DataFusionError::External(Box::new(DistributedPlanError::LimitTasks(limit))) | ||
| } | ||
|
|
||
| /// Builds a [DistributedPlanError::NonDistributable] error. This error prompts the distributed | ||
| /// planner to not distribute the query at all. | ||
| pub fn non_distributable_err(name: &'static str) -> DataFusionError { | ||
| DataFusionError::External(Box::new(DistributedPlanError::NonDistributable(name))) | ||
| } | ||
|
|
||
| pub(crate) fn get_distribute_plan_err(err: &DataFusionError) -> Option<&DistributedPlanError> { | ||
| let DataFusionError::External(err) = err else { | ||
| return None; | ||
| }; | ||
| err.downcast_ref() | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| mod distributed_physical_optimizer_rule; | ||
| mod distributed_plan_error; | ||
| mod network_boundary; | ||
|
|
||
| pub use distributed_physical_optimizer_rule::DistributedPhysicalOptimizerRule; | ||
| pub use distributed_plan_error::{DistributedPlanError, limit_tasks_err, non_distributable_err}; | ||
| pub use network_boundary::{InputStageInfo, NetworkBoundary, NetworkBoundaryExt}; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,86 @@ | ||
| use crate::{NetworkCoalesceExec, NetworkShuffleExec, Stage}; | ||
| use datafusion::common::plan_err; | ||
| use datafusion::physical_plan::ExecutionPlan; | ||
| use std::sync::Arc; | ||
|
|
||
| /// Necessary information for building a [Stage] during distributed planning. | ||
| /// | ||
| /// [NetworkBoundary]s return this piece of data so that the distributed planner know how to | ||
| /// build the next [Stage] from which the [NetworkBoundary] is going to receive data. | ||
| /// | ||
| /// Some network boundaries might perform some modifications in their children, like scaling | ||
| /// up the number of partitions, or injecting a specific [ExecutionPlan] on top. | ||
| pub struct InputStageInfo { | ||
| /// The head plan of the [Stage] that is about to be built. | ||
| pub plan: Arc<dyn ExecutionPlan>, | ||
| /// The amount of tasks the [Stage] will have. | ||
| pub task_count: usize, | ||
| } | ||
|
|
||
| /// This trait represents a node that introduces the necessity of a network boundary in the plan. | ||
| /// The distributed planner, upon stepping into one of these, will break the plan and build a stage | ||
| /// out of it. | ||
| pub trait NetworkBoundary: ExecutionPlan { | ||
| /// Returns the information necessary for building the next stage from which this | ||
| /// [NetworkBoundary] is going to collect data. | ||
| fn get_input_stage_info(&self, task_count: usize) | ||
| -> datafusion::common::Result<InputStageInfo>; | ||
|
|
||
| /// re-assigns a different number of input tasks to the current [NetworkBoundary]. | ||
| /// | ||
| /// This will be called if upon building a stage, a [crate::distributed_planner::distributed_physical_optimizer_rule::DistributedPlanError::LimitTasks] error | ||
| /// is returned, prompting the [NetworkBoundary] to choose a different number of input tasks. | ||
| fn with_input_task_count( | ||
| &self, | ||
| input_tasks: usize, | ||
| ) -> datafusion::common::Result<Arc<dyn NetworkBoundary>>; | ||
|
|
||
| /// Called when a [Stage] is correctly formed. The [NetworkBoundary] can use this | ||
| /// information to perform any internal transformations necessary for distributed execution. | ||
| /// | ||
| /// Typically, [NetworkBoundary]s will use this call for transitioning from "Pending" to "ready". | ||
| fn with_input_stage( | ||
| &self, | ||
| input_stage: Stage, | ||
| ) -> datafusion::common::Result<Arc<dyn ExecutionPlan>>; | ||
|
|
||
| /// Returns the assigned input [Stage], if any. | ||
| fn input_stage(&self) -> Option<&Stage>; | ||
|
|
||
| /// The planner might decide to remove this [NetworkBoundary] from the plan if it decides that | ||
| /// it's not going to bring any benefit. The [NetworkBoundary] will be replaced with whatever | ||
| /// this function returns. | ||
| fn rollback(&self) -> datafusion::common::Result<Arc<dyn ExecutionPlan>> { | ||
| let children = self.children(); | ||
| if children.len() != 1 { | ||
| return plan_err!( | ||
| "Expected distributed node {} to have exactly 1 children, but got {}", | ||
| self.name(), | ||
| children.len() | ||
| ); | ||
| } | ||
| Ok(Arc::clone(children.first().unwrap())) | ||
| } | ||
| } | ||
|
|
||
| /// Extension trait for downcasting dynamic types to [NetworkBoundary]. | ||
| pub trait NetworkBoundaryExt { | ||
| /// Downcasts self to a [NetworkBoundary] if possible. | ||
| fn as_network_boundary(&self) -> Option<&dyn NetworkBoundary>; | ||
| /// Returns whether self is a [NetworkBoundary] or not. | ||
| fn is_network_boundary(&self) -> bool { | ||
| self.as_network_boundary().is_some() | ||
| } | ||
| } | ||
|
|
||
| impl NetworkBoundaryExt for dyn ExecutionPlan { | ||
| fn as_network_boundary(&self) -> Option<&dyn NetworkBoundary> { | ||
| if let Some(node) = self.as_any().downcast_ref::<NetworkShuffleExec>() { | ||
| Some(node) | ||
| } else if let Some(node) = self.as_any().downcast_ref::<NetworkCoalesceExec>() { | ||
| Some(node) | ||
| } else { | ||
| None | ||
| } | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious if this can be made generic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 what do you mean by generic here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right. Please ignore 🙇🏽