Nested Loop Joins (fixes TPCH query 22) #104
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently our strategy further divide stages into smaller units of work involves splitting the number of partitions in the stage into tasks, each responsible for a portion of the partitions (facilitated by including a
PartitionIsolator).This works when we separate stages at a
RepartitionExecboundary, but fails when a node of the plan needs to materialize all data across all partitions.NestedLoopJoinExecis one such node.This PR adds a
can_be_divided()function to return a boolean if a plan can be divided further. This is used in planning to decide if the stage can be separated into tasks.For example, for TPCH Query 22, with
partitions=3andpartitions_per_task=2we generate aphysical plan:
and a distributed plan that looks like:

Note that
Stage 5in TPCH 22 has aNestedLoopJoinwhich wants to fully materialize one side of the query. Because stage 3 is in two tasks, we attempt to do this twice, causing an error when we read the same partition twice in child stages.Specifying that

NestedLoopJoinExeccannot be split produces a plan like:Keeping the nested loop join in a single task, and addressing the problem.