-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Avoid skew in Roundrobin repartition #18880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
| assert_eq!(partition_row_counts.len(), 3); | ||
| assert_eq!(partition_row_counts[0], 2); | ||
| assert_eq!(partition_row_counts[0], 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shows the improvement - all output partitions get at least 1 batch, instead of skewing them to the left side.
026513e to
7923e9c
Compare
|
Couldn't show perf difference on my 10 core machine. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me -- thank you @Dandandan
I have scheduled a benchmark run just to make sure there isn't something we missed but I don't think so
|
🤖 |
|
🤖: Benchmark completed Details
|
Looks like mostly noise. |
|
Sorry I'm a bit late with my comment. I've stumbled upon this change updating ballista. Its a bit confusing change from usage perspective, there are two parameters which are used only in round robin case, would it make sense splitting |
|
makes sense to me |
|
created #19664 |
…ructors (#19668) ### Which issue does this PR close? Closes #19664 --- ### Rationale for this change After #18880, `BatchPartitioner::try_new` gained additional parameters that are only relevant for round-robin repartitioning. This made the constructor API confusing, as hash repartitioning received parameters it does not use. Splitting the constructor improves clarity and avoids passing round-robin–specific parameters to hash partitioning. --- ### What changes are included in this PR? - Introduce `BatchPartitioner::try_new_hash` - Introduce `BatchPartitioner::try_new_round_robin` - Refactor callers to use the specialized constructors - Retain `BatchPartitioner::try_new` as a delegator for backward compatibility This is a pure refactor; behavior is unchanged. --- ### Are these changes tested? Yes. Existing tests cover this code path. All builds pass locally. --- ### Are there any user-facing changes? No. This change is internal only and does not affect user-facing behavior or APIs. --------- Co-authored-by: Your Name <[email protected]>
…ructors (apache#19668) ### Which issue does this PR close? Closes apache#19664 --- ### Rationale for this change After apache#18880, `BatchPartitioner::try_new` gained additional parameters that are only relevant for round-robin repartitioning. This made the constructor API confusing, as hash repartitioning received parameters it does not use. Splitting the constructor improves clarity and avoids passing round-robin–specific parameters to hash partitioning. --- ### What changes are included in this PR? - Introduce `BatchPartitioner::try_new_hash` - Introduce `BatchPartitioner::try_new_round_robin` - Refactor callers to use the specialized constructors - Retain `BatchPartitioner::try_new` as a delegator for backward compatibility This is a pure refactor; behavior is unchanged. --- ### Are these changes tested? Yes. Existing tests cover this code path. All builds pass locally. --- ### Are there any user-facing changes? No. This change is internal only and does not affect user-facing behavior or APIs. --------- Co-authored-by: Your Name <[email protected]>
Which issue does this PR close?
Rationale for this change
This makes roundrobin repartition more fairly distributed.
The benchmarks probably don't reflect this as much (maybe on very high core counts?), as the partitioning already mostly happens at the source side.
What changes are included in this PR?
Set start partition based on input partition.
Are these changes tested?
Existing tests
Are there any user-facing changes?