feat: added disk spilling for merge#4219
Conversation
|
ACTION NEEDED delta-rs follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
4b997d3 to
e9910cd
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4219 +/- ##
==========================================
- Coverage 76.76% 76.73% -0.04%
==========================================
Files 166 166
Lines 48266 48255 -11
Branches 48266 48255 -11
==========================================
- Hits 37053 37030 -23
- Misses 9354 9367 +13
+ Partials 1859 1858 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I believe the failed test are due to arro3-core==0.7.0 being released. If I downgrad to 0.6.5 locally they run succesfully. |
|
Is there a reason you do not use the existing methods of setting these builder config values? https://github.com/delta-io/delta-rs/blob/main/crates/core/src/delta_datafusion/session.rs#L289 for reference. |
|
I looked a bit further, it looks like this basically did what optimize does, which is fine, but would you be able to remove the optimize and the new path you added and unify them all around the single builder creation path that already exists? |
|
Maybe we should expose the datafusion session configuration instead into Python and then allow to pass that session into any operation. |
|
@ion-elgreco my gut feeling is that would probably hurt discovery / add complexity of settings a bit for users using delta via other libs such as polars. Are there other examples of settings on the session that you think would be beneficial from the Python side? |
It would simplify our operations api in Python way more since there is one SessionConfig |
ion-elgreco
left a comment
There was a problem hiding this comment.
Please fix the tests and then we can merge
|
@ion-elgreco done. Changes to compare with arro3 types. |
Head branch was pushed to by a user without write access
9411bf2 to
f849bcb
Compare
|
These Python test failures I am seeing across multiple pull requests right now, so I'm going to self-assign this, figure it out, and then merge once I have a fix 😄 |
|
arro3 0.8.0 is released which rolls back some changes such that the comparison should work again between pyarrow and arro3: kylebarron/arro3#483 I will change back the tests EDIT: It looks like the release of 0.8.0 failed https://github.com/kylebarron/arro3/actions/runs/22292112536/job/64481282723 |
aa87b41 to
c9edc75
Compare
|
@ion-elgreco I removed my changes to the tests related to arro3 since 0.8.0 was released. Can you give a review again? :-) |
c9edc75 to
2c8fab2
Compare
feat: Added disk spilling for merge Added disk spilling for merge similar to optimize functions and united in one helper function Signed-off-by: Thomas Frederik Hoeck <tfh@norden.com>
Head branch was pushed to by a user without write access
2c8fab2 to
7e25948
Compare
feat: Added disk spilling for merge Added disk spilling for merge similar to optimize functions and united in one helper function Signed-off-by: Thomas Frederik Hoeck <tfh@norden.com>
7e25948 to
bca4d1d
Compare
…erikhoeck/delta-rs into feat_spill_disk_merge
Added disk spilling for
mergesimilar to optimize functions to allow for merges which touches many files in the targetDescription
I have added functionality for spilling to disk similar to how it works in the optimize functions. If nothing is provided it works as before.
I have added test similar to those for the other spill functions.
I have tested my cases in #4217 which now successfully completes the merge without OOM.
I have used AI (Opus 4.6) for getting a overview of the project structure and for writing most of the code. I have review and verified the code myself.
Work done:
create_session_state_with_spill_config(which is just a move and rename ofcreate_session_state_for_optimize)create_session_state_with_spill_configin existing optimize functionscreate_session_state_with_spill_configformergeRelated Issue(s)
Closes #4217
Documentation