Skip to content

Conversation

Li0k
Copy link

@Li0k Li0k commented Sep 10, 2025

Which issue does this PR close?

What changes are included in this PR?

The main goal of this PR is to remove dangling delete files to meet the requirements of partial compaction. Therefore, the filter manager is introduced to manage manifests and enable manifest rewriting.

This pull request introduces significant improvements to the transaction and snapshot management in the Iceberg Rust implementation, focusing on more robust handling of manifest files, especially for delete files. The changes include the addition of a new manifest filtering mechanism, refactoring how manifest counters are managed, and updating snapshot actions to utilize these improvements.

Manifest Filtering and Snapshot Management Enhancements:

  • Added a new module manifest_filter to handle filtering of manifest files, and exposed its functionality for use in transactions. (crates/iceberg/src/transaction/mod.rs)
  • Refactored SnapshotProduceAction to use a shared atomic manifest counter and introduced the ManifestFilterManager for more sophisticated management of delete file manifests. This includes removing direct tracking of removed delete files and delegating that responsibility to the filter manager. (crates/iceberg/src/transaction/snapshot.rs) [1] [2] [3]
  • Updated the logic for collecting and filtering existing manifests during snapshot production. Now, delete file manifests are filtered using the new manager, and only relevant manifests are retained for further processing. (crates/iceberg/src/transaction/snapshot.rs)
  • Changed manifest file naming to use a shared atomic counter and factored out the path generation into a new helper function new_manifest_path, improving consistency and avoiding naming conflicts. (crates/iceberg/src/transaction/snapshot.rs) [1] [2]
  • Added uuid as a workspace dependency in Cargo.toml to support unique identification for manifest files. (crates/examples/Cargo.toml)

These changes collectively enhance the reliability and maintainability of manifest management in Iceberg transactions, especially around handling delete files and manifest file naming.

Are these changes tested?

@Li0k Li0k requested a review from chenzl25 September 10, 2025 10:17
Copy link
Collaborator

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM!

Comment on lines 473 to 476
let filtered_data_manifests = self
.filter_manager
.filter_manifests(&schema, existing_data_manifests)
.await?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to apply filter manager against the data manifests, since filter manager is only used to choose dangling delete files for our partial compaction.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think under the current framework, we don’t need to filter the data files for now. they’ve already been rewritten during the existing_manifest phase.

I will remove them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants