-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Avoid repartition in build side #19812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Run benchmarks |
2 similar comments
|
Run benchmarks |
|
Run benchmarks |
|
@Dandandan any breadcrumbs on the idea here? |
|
I updated the description with some early bench results - on average quite a bit faster with some (one bigger 1.90x slower QQuery 12) slowdowns, need to find out the source of slowdowns. |
This was because of missing parallelism, updated results with parallel build. (Need to fix null handling next) |
|
Updated now with correct output - the results seem a bit more mixed now (but still positive on average). Going to see if we can improve the slowdowns again. |
| - RepartitionExec: partitioning=Hash([a@0, b@1], 12), input_partitions=1 | ||
| - DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[a, b, c], file_type=test, pushdown_supported=true | ||
| - RepartitionExec: partitioning=Hash([a@0, b@1], 12), input_partitions=1 | ||
| - DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[a, b, e], file_type=test, pushdown_supported=true, predicate=DynamicFilter [ CASE hash_repartition % 12 WHEN 2 THEN a@0 >= ab AND a@0 <= ab AND b@1 >= bb AND b@1 <= bb AND struct(a@0, b@1) IN (SET) ([{c0:ab,c1:bb}]) WHEN 4 THEN a@0 >= aa AND a@0 <= aa AND b@1 >= ba AND b@1 <= ba AND struct(a@0, b@1) IN (SET) ([{c0:aa,c1:ba}]) ELSE false END ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's not possible to do filtering during the scan anymore with this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be possible - but somehow it got removed during the AI-based generation ;)
|
Run benchmark tpcds |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Mmmm.... |
Which issue does this PR close?
Partitionedhash join #19789Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?