β‘ Bolt: Optimize Hash Join with Single-Batch Fast Path #222
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
π‘ What:
This change introduces a "fast path" for the hash join build process. If the build-side input consists of only a single
RecordBatch, it bypasses theconcat_batchesfunction, which unnecessarily allocates new buffers and copies data.π― Why:
The
concat_batchesoperation introduces overhead, especially when it's not needed. For many common join scenarios (e.g., joins with small dimension tables that fit in one batch), this change avoids the associated memory allocations and CPU cycles, leading to better performance.π Impact:
Improves hash join performance by reducing memory allocation and CPU usage for single-batch build-side inputs.
π¬ Measurement:
This is a well-understood optimization pattern. The performance improvement can be verified by running join benchmarks where the build side is a single batch and observing the reduction in execution time and memory usage.
PR created automatically by Jules for task 6673524610506287030 started by @Dandandan