Skip to content

feat: log compaction#4210

Merged
rtyler merged 4 commits intodelta-io:mainfrom
ion-elgreco:feat/log-compaction
Feb 24, 2026
Merged

feat: log compaction#4210
rtyler merged 4 commits intodelta-io:mainfrom
ion-elgreco:feat/log-compaction

Conversation

@ion-elgreco
Copy link
Collaborator

Description

The description of the main changes of your pull request

Related Issue(s)

Documentation

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Feb 17, 2026
@codecov
Copy link

codecov bot commented Feb 17, 2026

Codecov Report

❌ Patch coverage is 5.95238% with 79 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.61%. Comparing base (e7b7885) to head (2e3aac9).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
python/src/lib.rs 0.00% 39 Missing ⚠️
crates/core/src/protocol/log_compaction.rs 0.00% 38 Missing ⚠️
crates/core/src/protocol/mod.rs 71.42% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4210      +/-   ##
==========================================
- Coverage   76.75%   76.61%   -0.14%     
==========================================
  Files         166      167       +1     
  Lines       48277    48332      +55     
  Branches    48277    48332      +55     
==========================================
- Hits        37053    37030      -23     
- Misses       9365     9444      +79     
+ Partials     1859     1858       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ethan-tyler
Copy link
Collaborator

I'm gonna review today

Copy link
Collaborator

@ethan-tyler ethan-tyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this - added a couple of comments for your consideration.

"""
self._table.create_checkpoint()

def compact_logs(self, starting_version: int, ending_version: int) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need docs for this new pub api?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add that later stage ;) for now its a hidden gem

Copy link
Member

@rtyler rtyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite straight-forward and simple. I do think @ethan-tyler has made some points worth addressing.

I don't really understand the point of these log compaction files compared to checkpoints, but that's not an @ion-elgreco question to resolve 😆 Seems like more over complication of the protocol. 🤷 🙈

If these do provide reader benefits, what if we fire-and-forget a task that tries to generate on of these files in the background after a write on version % 25 or something? If we succeed in writing a log compaction file, great, if we don't, who cares 😆

@ion-elgreco
Copy link
Collaborator Author

Quite straight-forward and simple. I do think @ethan-tyler has made some points worth addressing.

I don't really understand the point of these log compaction files compared to checkpoints, but that's not an @ion-elgreco question to resolve 😆 Seems like more over complication of the protocol. 🤷 🙈

If these do provide reader benefits, what if we fire-and-forget a task that tries to generate on of these files in the background after a write on version % 25 or something? If we succeed in writing a log compaction file, great, if we don't, who cares 😆

I assume kernel log replay makes use of these if it encounters them. I can do this in a commit hook in a follow up pr

@ion-elgreco ion-elgreco force-pushed the feat/log-compaction branch 2 times, most recently from 2d0479f to 1e51705 Compare February 19, 2026 11:52
@ion-elgreco
Copy link
Collaborator Author

Exact matching of our json logs vs this compaction log is not possible because in the compaction log we drop null columns, suggested by kernel, but our own action to bytes serializer does not. Maybe a small optimization we can do in the future is to also drop null columns

NOTE: Null columns should not be written to the JSON file. For example, if a row has columns [“a”, “b”] and the value of “b” is null, the JSON object should be written as { “a”: “…” }. Note that including nulls is technically valid JSON, but would bloat the log, therefore we recommend omitting them.

Copy link
Collaborator

@ethan-tyler ethan-tyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks for taking my feedback :)

@hntd187
Copy link
Collaborator

hntd187 commented Feb 19, 2026

Exact matching of our json logs vs this compaction log is not possible because in the compaction log we drop null columns, suggested by kernel, but our own action to bytes serializer does not. Maybe a small optimization we can do in the future is to also drop null columns

NOTE: Null columns should not be written to the JSON file. For example, if a row has columns [“a”, “b”] and the value of “b” is null, the JSON object should be written as { “a”: “…” }. Note that including nulls is technically valid JSON, but would bloat the log, therefore we recommend omitting them.

This will eventually manifest as a bug of some sort since we're doing different things from kernel.

@ethan-tyler
Copy link
Collaborator

This will eventually manifest as a bug of some sort since we're doing different things from kernel.

I see your concern but I don't think this is a correctness bug. Both encodings are semantically equivalent for Delta replay. You should be able to validate this with replay/state equivalence and not byte matching. Worth a follow up to unify serializers across paths, but IMO not a blocker.

@hntd187
Copy link
Collaborator

hntd187 commented Feb 20, 2026

This will eventually manifest as a bug of some sort since we're doing different things from kernel.

I see your concern but I don't think this is a correctness bug. Both encodings are semantically equivalent for Delta replay. You should be able to validate this with replay/state equivalence and not byte matching. Worth a follow up to unify serializers across paths, but IMO not a blocker.

Yeah, it's probably fine, it's more we don't know what we don't know. I'm just saying there is prolly some corner case that'll come up as a bug eventually on this. This still LGTM

@ion-elgreco ion-elgreco enabled auto-merge (squash) February 21, 2026 16:15
ion-elgreco and others added 4 commits February 24, 2026 13:33
Signed-off-by: Ion Koutsouris
<15728914+ion-elgreco@users.noreply.github.com>
Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
Co-authored-by: Ethan Urbanski <ethanurbanski@gmail.com>
Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
Signed-off-by: Ion Koutsouris
<15728914+ion-elgreco@users.noreply.github.com>
Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
@rtyler rtyler force-pushed the feat/log-compaction branch from a0d6d9f to 2e3aac9 Compare February 24, 2026 13:37
@rtyler rtyler disabled auto-merge February 24, 2026 14:18
@rtyler rtyler merged commit 65b50e0 into delta-io:main Feb 24, 2026
29 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

binding/python Issues for the Python package binding/rust Issues for the Rust crate

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants