Skip to content

feat: added disk spilling for merge#4219

Merged
ion-elgreco merged 3 commits intodelta-io:mainfrom
thomasfrederikhoeck:feat_spill_disk_merge
Feb 24, 2026
Merged

feat: added disk spilling for merge#4219
ion-elgreco merged 3 commits intodelta-io:mainfrom
thomasfrederikhoeck:feat_spill_disk_merge

Conversation

@thomasfrederikhoeck
Copy link
Contributor

@thomasfrederikhoeck thomasfrederikhoeck commented Feb 20, 2026

Added disk spilling for merge similar to optimize functions to allow for merges which touches many files in the target

Description

I have added functionality for spilling to disk similar to how it works in the optimize functions. If nothing is provided it works as before.

I have added test similar to those for the other spill functions.

I have tested my cases in #4217 which now successfully completes the merge without OOM.

I have used AI (Opus 4.6) for getting a overview of the project structure and for writing most of the code. I have review and verified the code myself.

Work done:

  • create create_session_state_with_spill_config (which is just a move and rename of create_session_state_for_optimize)
  • use create_session_state_with_spill_config in existing optimize functions
  • use create_session_state_with_spill_config for merge

Related Issue(s)

Closes #4217

Documentation

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Feb 20, 2026
@github-actions
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@thomasfrederikhoeck thomasfrederikhoeck changed the title Added disk spilling for merge feat: Added disk spilling for merge Feb 20, 2026
@thomasfrederikhoeck thomasfrederikhoeck force-pushed the feat_spill_disk_merge branch 2 times, most recently from 4b997d3 to e9910cd Compare February 20, 2026 11:16
@thomasfrederikhoeck thomasfrederikhoeck changed the title feat: Added disk spilling for merge feat: added disk spilling for merge Feb 20, 2026
@codecov
Copy link

codecov bot commented Feb 20, 2026

Codecov Report

❌ Patch coverage is 26.66667% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.73%. Comparing base (1724f89) to head (1c3ac23).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/delta_datafusion/session.rs 43.75% 9 Missing ⚠️
python/src/merge.rs 0.00% 7 Missing ⚠️
python/src/lib.rs 0.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4219      +/-   ##
==========================================
- Coverage   76.76%   76.73%   -0.04%     
==========================================
  Files         166      166              
  Lines       48266    48255      -11     
  Branches    48266    48255      -11     
==========================================
- Hits        37053    37030      -23     
- Misses       9354     9367      +13     
+ Partials     1859     1858       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@thomasfrederikhoeck
Copy link
Contributor Author

I believe the failed test are due to arro3-core==0.7.0 being released. If I downgrad to 0.6.5 locally they run succesfully.

@hntd187
Copy link
Collaborator

hntd187 commented Feb 20, 2026

Is there a reason you do not use the existing methods of setting these builder config values? https://github.com/delta-io/delta-rs/blob/main/crates/core/src/delta_datafusion/session.rs#L289 for reference.

@hntd187
Copy link
Collaborator

hntd187 commented Feb 20, 2026

I looked a bit further, it looks like this basically did what optimize does, which is fine, but would you be able to remove the optimize and the new path you added and unify them all around the single builder creation path that already exists?

@ion-elgreco
Copy link
Collaborator

Maybe we should expose the datafusion session configuration instead into Python and then allow to pass that session into any operation.

@thomasfrederikhoeck
Copy link
Contributor Author

@ion-elgreco my gut feeling is that would probably hurt discovery / add complexity of settings a bit for users using delta via other libs such as polars. Are there other examples of settings on the session that you think would be beneficial from the Python side?

@ion-elgreco
Copy link
Collaborator

@ion-elgreco my gut feeling is that would probably hurt discovery / add complexity of settings a bit for users using delta via other libs such as polars. Are there other examples of settings on the session that you think would be beneficial from the Python side?

It would simplify our operations api in Python way more since there is one SessionConfig

ion-elgreco
ion-elgreco previously approved these changes Feb 21, 2026
Copy link
Collaborator

@ion-elgreco ion-elgreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the tests and then we can merge

@thomasfrederikhoeck
Copy link
Contributor Author

@ion-elgreco done. Changes to compare with arro3 types.

ion-elgreco
ion-elgreco previously approved these changes Feb 22, 2026
@ion-elgreco ion-elgreco enabled auto-merge (squash) February 22, 2026 14:17
auto-merge was automatically disabled February 22, 2026 17:28

Head branch was pushed to by a user without write access

@rtyler rtyler self-assigned this Feb 22, 2026
@rtyler
Copy link
Member

rtyler commented Feb 22, 2026

These Python test failures I am seeing across multiple pull requests right now, so I'm going to self-assign this, figure it out, and then merge once I have a fix 😄

@thomasfrederikhoeck
Copy link
Contributor Author

thomasfrederikhoeck commented Feb 23, 2026

arro3 0.8.0 is released which rolls back some changes such that the comparison should work again between pyarrow and arro3: kylebarron/arro3#483

https://github.com/kylebarron/arro3/pull/483/changes#diff-aac088c485a53a5add25496062171f00197641b1f6a95445110eed12de5259f4

I will change back the tests

EDIT: It looks like the release of 0.8.0 failed https://github.com/kylebarron/arro3/actions/runs/22292112536/job/64481282723

@thomasfrederikhoeck thomasfrederikhoeck force-pushed the feat_spill_disk_merge branch 2 times, most recently from aa87b41 to c9edc75 Compare February 23, 2026 20:32
@thomasfrederikhoeck
Copy link
Contributor Author

@ion-elgreco I removed my changes to the tests related to arro3 since 0.8.0 was released. Can you give a review again? :-)

@ion-elgreco ion-elgreco force-pushed the feat_spill_disk_merge branch from c9edc75 to 2c8fab2 Compare February 24, 2026 08:54
ion-elgreco
ion-elgreco previously approved these changes Feb 24, 2026
@ion-elgreco ion-elgreco enabled auto-merge (squash) February 24, 2026 08:55
feat: Added disk spilling for merge

Added disk spilling for merge similar to optimize functions and united in one helper function

Signed-off-by: Thomas Frederik Hoeck <tfh@norden.com>
auto-merge was automatically disabled February 24, 2026 09:30

Head branch was pushed to by a user without write access

feat: Added disk spilling for merge

Added disk spilling for merge similar to optimize functions and united in one helper function

Signed-off-by: Thomas Frederik Hoeck <tfh@norden.com>
@ion-elgreco ion-elgreco force-pushed the feat_spill_disk_merge branch from 7e25948 to bca4d1d Compare February 24, 2026 09:42
@ion-elgreco ion-elgreco enabled auto-merge (squash) February 24, 2026 10:48
@ion-elgreco ion-elgreco merged commit e7b7885 into delta-io:main Feb 24, 2026
29 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

binding/python Issues for the Python package binding/rust Issues for the Rust crate

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Feature]: Write incrementally to table when merging

4 participants