-
Notifications
You must be signed in to change notification settings - Fork 109
Upgrade DF to 51 #5189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Upgrade DF to 51 #5189
Conversation
CodSpeed Performance ReportMerging #5189 will degrade performance by 47.48%Comparing Summary
Benchmarks breakdown
Footnotes
|
Benchmarks: FineWeb NVMeSummary
Detailed Results Table
|
Benchmarks: FineWeb S3Summary
Detailed Results Table
|
Benchmarks: GitHub Archive (NVMe)Summary
Detailed Results Table
|
Benchmarks: GitHub Archive (S3)Summary
Detailed Results Table
|
Benchmarks: Clickbench on NVMESummary
Detailed Results Table
|
Benchmarks: Statistical and Population GeneticsSummary
Detailed Results Table
|
e587525 to
07aee41
Compare
Benchmarks: TPC-H SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on S3Summary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on NVMESummary
Detailed Results Table
|
|
So @a10y is on vacation for the next few weeks, so just updating my current thinking here:
|
|
I'm also on PTO this week so won't be very reachable but I think most of the urgency comes from some changes @brancz is working on so I'll defer to him. Re the vendored code I don't think I mind it too much, especially since it'll be deleted soon. I think the question is whether it's worth the effort to push through this 51 release if 52 is coming soon and most everyone is on holiday starting next week. |
|
My personal bias is to merge the vendored code now since it’s an indication of an upstream bug for a feature we care about. Then delete it once 52 is around. Without it gharchive is broken and it’s a bit annoying to wait for upstream. |
|
No strong opinion one way or another, we're already at the point where we're carrying several unreleased patches on upstream for both arrow and datafusion. |
|
Jynxed it, I'm quite certain I'm seeing a very noticeable case for us where we're lacking projection pushdown for a field in a struct (which is one of the things this fixes). Would appreciate it if we could merge this. |
|
This is currently blocked by #5676, once I merge that (hopefully today), I'll rebase this PR and try and catch up on all the pieces here and get it merged. |
Signed-off-by: Adam Gutglick <[email protected]>
Signed-off-by: Adam Gutglick <[email protected]>
Signed-off-by: Andrew Duffy <[email protected]>
Signed-off-by: Andrew Duffy <[email protected]>
Signed-off-by: Andrew Duffy <[email protected]>
Signed-off-by: Andrew Duffy <[email protected]>
Signed-off-by: Andrew Duffy <[email protected]>
Signed-off-by: Andrew Duffy <[email protected]>
794bcee to
9b8aeb8
Compare
Signed-off-by: Adam Gutglick <[email protected]>
|
I've rebased this PR and fixed a couple of small random issues that came up, the codspeed regressions all point back to Arrow, but that might be an artifact of the runtime because the only relevant changes in Arrow have some pretty convincing benchmarks? I'm not worried about them. |
|
I can't reproduce this TPC-DS issue, if it keeps popping up I'll dig deeper into it. |
|
@asubiotto if you want to take look here, @a10y is OOO until the 7th IMO, not sure if we have anyone else that can weigh in. |
|
finally managed to get a repro of the TPC-DS issue, it has something to do with the new decimal types, I think we should fix that before we merge this, so I'll keep working on it. |
|
Fixing TPC-DS here might require a deeper fix in DF, for now I'll remove some of the DF/Decimal code paths |
Signed-off-by: Adam Gutglick <[email protected]>
| let scale = dt.scale(); | ||
|
|
||
| match precision { | ||
| // This code is commented out until DataFusion improves its support for smaller decimals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will stay for now, I've hit an overflow in TPC-DS q1, and I might take a bigger stab at how to handle it in DF.
Testing the upgrade to DF 51.
51.0.0(Nov 2025) apache/datafusion#17558 (comment)