feat: log compaction by ion-elgreco · Pull Request #4210 · delta-io/delta-rs

ion-elgreco · 2026-02-17T18:11:40Z

Description

The description of the main changes of your pull request

Related Issue(s)

Documentation

codecov · 2026-02-17T18:16:17Z

Codecov Report

❌ Patch coverage is 5.95238% with 79 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.61%. Comparing base (e7b7885) to head (2e3aac9).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
python/src/lib.rs	0.00%	39 Missing ⚠️
crates/core/src/protocol/log_compaction.rs	0.00%	38 Missing ⚠️
crates/core/src/protocol/mod.rs	71.42%	0 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4210      +/-   ##
==========================================
- Coverage   76.75%   76.61%   -0.14%     
==========================================
  Files         166      167       +1     
  Lines       48277    48332      +55     
  Branches    48277    48332      +55     
==========================================
- Hits        37053    37030      -23     
- Misses       9365     9444      +79     
+ Partials     1859     1858       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ethan-tyler · 2026-02-18T13:04:34Z

I'm gonna review today

ethan-tyler

Love this - added a couple of comments for your consideration.

crates/core/src/protocol/log_compaction.rs

python/src/lib.rs

python/tests/test_log_compaction.py

ethan-tyler · 2026-02-18T16:37:07Z

python/deltalake/table.py

        """
        self._table.create_checkpoint()

+    def compact_logs(self, starting_version: int, ending_version: int) -> None:


Do we need docs for this new pub api?

We can add that later stage ;) for now its a hidden gem

crates/core/src/protocol/log_compaction.rs

rtyler

Quite straight-forward and simple. I do think @ethan-tyler has made some points worth addressing.

I don't really understand the point of these log compaction files compared to checkpoints, but that's not an @ion-elgreco question to resolve 😆 Seems like more over complication of the protocol. 🤷 🙈

If these do provide reader benefits, what if we fire-and-forget a task that tries to generate on of these files in the background after a write on version % 25 or something? If we succeed in writing a log compaction file, great, if we don't, who cares 😆

ion-elgreco · 2026-02-19T09:34:51Z

Quite straight-forward and simple. I do think @ethan-tyler has made some points worth addressing.

I don't really understand the point of these log compaction files compared to checkpoints, but that's not an @ion-elgreco question to resolve 😆 Seems like more over complication of the protocol. 🤷 🙈

If these do provide reader benefits, what if we fire-and-forget a task that tries to generate on of these files in the background after a write on version % 25 or something? If we succeed in writing a log compaction file, great, if we don't, who cares 😆

I assume kernel log replay makes use of these if it encounters them. I can do this in a commit hook in a follow up pr

ion-elgreco · 2026-02-19T11:56:23Z

Exact matching of our json logs vs this compaction log is not possible because in the compaction log we drop null columns, suggested by kernel, but our own action to bytes serializer does not. Maybe a small optimization we can do in the future is to also drop null columns

NOTE: Null columns should not be written to the JSON file. For example, if a row has columns [“a”, “b”] and the value of “b” is null, the JSON object should be written as { “a”: “…” }. Note that including nulls is technically valid JSON, but would bloat the log, therefore we recommend omitting them.

ethan-tyler

LGTM - Thanks for taking my feedback :)

hntd187 · 2026-02-19T17:49:46Z

Exact matching of our json logs vs this compaction log is not possible because in the compaction log we drop null columns, suggested by kernel, but our own action to bytes serializer does not. Maybe a small optimization we can do in the future is to also drop null columns

NOTE: Null columns should not be written to the JSON file. For example, if a row has columns [“a”, “b”] and the value of “b” is null, the JSON object should be written as { “a”: “…” }. Note that including nulls is technically valid JSON, but would bloat the log, therefore we recommend omitting them.

This will eventually manifest as a bug of some sort since we're doing different things from kernel.

ethan-tyler · 2026-02-19T18:30:36Z

This will eventually manifest as a bug of some sort since we're doing different things from kernel.

I see your concern but I don't think this is a correctness bug. Both encodings are semantically equivalent for Delta replay. You should be able to validate this with replay/state equivalence and not byte matching. Worth a follow up to unify serializers across paths, but IMO not a blocker.

hntd187 · 2026-02-20T06:01:50Z

This will eventually manifest as a bug of some sort since we're doing different things from kernel.

I see your concern but I don't think this is a correctness bug. Both encodings are semantically equivalent for Delta replay. You should be able to validate this with replay/state equivalence and not byte matching. Worth a follow up to unify serializers across paths, but IMO not a blocker.

Yeah, it's probably fine, it's more we don't know what we don't know. I'm just saying there is prolly some corner case that'll come up as a bug eventually on this. This still LGTM

Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com> Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>

Co-authored-by: Ethan Urbanski <ethanurbanski@gmail.com> Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>

Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>

Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com> Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>

ion-elgreco requested review from hntd187, roeap and rtyler as code owners February 17, 2026 18:11

github-project-automation bot added this to delta-rust Feb 17, 2026

github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Feb 17, 2026

ethan-tyler reviewed Feb 18, 2026

View reviewed changes

rtyler reviewed Feb 18, 2026

View reviewed changes

ion-elgreco force-pushed the feat/log-compaction branch 2 times, most recently from 2d0479f to 1e51705 Compare February 19, 2026 11:52

ion-elgreco requested a review from ethan-tyler February 19, 2026 11:52

ion-elgreco force-pushed the feat/log-compaction branch from 1e51705 to 5308c94 Compare February 19, 2026 11:55

ethan-tyler approved these changes Feb 19, 2026

View reviewed changes

ion-elgreco enabled auto-merge (squash) February 21, 2026 16:15

ion-elgreco and others added 4 commits February 24, 2026 13:33

feat: log compaction

a363e9b

Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com> Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>

Update crates/core/src/protocol/log_compaction.rs

72b78a7

Co-authored-by: Ethan Urbanski <ethanurbanski@gmail.com> Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>

chore: resolve feedback

3d3a598

Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>

feat: multi part upload in compaction log

2e3aac9

Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com> Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>

rtyler force-pushed the feat/log-compaction branch from a0d6d9f to 2e3aac9 Compare February 24, 2026 13:37

rtyler approved these changes Feb 24, 2026

View reviewed changes

rtyler disabled auto-merge February 24, 2026 14:18

rtyler merged commit 65b50e0 into delta-io:main Feb 24, 2026
29 of 30 checks passed

github-project-automation bot moved this to Done in delta-rust Feb 24, 2026

Conversation

ion-elgreco commented Feb 17, 2026

Description

Related Issue(s)

Documentation

Uh oh!

codecov bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ethan-tyler commented Feb 18, 2026

Uh oh!

ethan-tyler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ethan-tyler Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

ion-elgreco Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rtyler left a comment

Choose a reason for hiding this comment

Uh oh!

ion-elgreco commented Feb 19, 2026

Uh oh!

ion-elgreco commented Feb 19, 2026

Uh oh!

ethan-tyler left a comment

Choose a reason for hiding this comment

Uh oh!

hntd187 commented Feb 19, 2026

Uh oh!

ethan-tyler commented Feb 19, 2026

Uh oh!

hntd187 commented Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Feb 17, 2026 •

edited

Loading