-
Notifications
You must be signed in to change notification settings - Fork 342
Fix: use new snapshot id in deleted manifest entry unless is existing entry #2266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
tests/table/test_delete.py
Outdated
|
||
def assert_manifest_entry(expected_status: ManifestEntryStatus, expected_snapshot_id: int) -> None: | ||
current_snapshot = table.refresh().current_snapshot() | ||
manifest_files = current_snapshot.manifests(table.io) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run a make lint
here the snapshot calls are a union with null and require an assertion or check if they exist before calling them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking! I see the CI failed with lint as well, I'll fix it in the next commit
tests/table/test_delete.py
Outdated
pass | ||
|
||
|
||
def test_manifest_entry_after_deletes(catalog: Catalog) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably add this test to the existing test_deletes.py suite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lliangyu-lin for raising this PR, it looks indeed that we have a bug here 👍
Co-authored-by: Fokko Driesprong <[email protected]>
@kevinjqliu Hi Kevin, could you help re-enable the CI for this PR again? |
Rationale for this change
Based on iceberg spec, when a manifest entry is marked as deleted, the snapshot id for when the entry was deleted should be used.
https://iceberg.apache.org/spec/?h=deletes#manifest-entry-fields
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestWriter.java#L178-L179
Incorrect snapshot id could lead to data being deleted during garbage collection when not supposed to.
Are these changes tested?
Added in
test_deletes.py
Are there any user-facing changes?
No