β‘ Bolt: Optimize hash join build-side for single-batch inputs #240
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
π‘ What: This change introduces a fast path in the
HashJoinExecoperator for cases where the build-side input consists of a singleRecordBatch.π― Why: Previously, the build-side logic would unconditionally call
concat_batches, even if there was only one batch. This incurred unnecessary overhead from memory allocation and data copying. By bypassing this operation for the common single-batch case, we can make the join more efficient.π Impact: This optimization improves the performance of hash joins by reducing memory allocation and CPU usage, especially in scenarios involving small dimension tables or when a
CoalesceBatchesExechas been applied upstream.π¬ Measurement: The performance improvement can be measured by running benchmarks (like TPC-H) where joins have a single-batch build side and observing the reduction in execution time and peak memory usage.
PR created automatically by Jules for task 18365902574244747359 started by @Dandandan