-
Notifications
You must be signed in to change notification settings - Fork 2
feat: manifest filter manager #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev_rebase_main_20250325
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM!
let filtered_data_manifests = self | ||
.filter_manager | ||
.filter_manifests(&schema, existing_data_manifests) | ||
.await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to apply filter manager against the data manifests, since filter manager is only used to choose dangling delete files for our partial compaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think under the current framework, we don’t need to filter the data files for now. they’ve already been rewritten during the existing_manifest phase.
I will remove them
…avelabs/iceberg-rust into li0k/manifest_filter
Which issue does this PR close?
What changes are included in this PR?
The main goal of this PR is to remove dangling delete files to meet the requirements of partial compaction. Therefore, the filter manager is introduced to manage manifests and enable manifest rewriting.
This pull request introduces significant improvements to the transaction and snapshot management in the Iceberg Rust implementation, focusing on more robust handling of manifest files, especially for delete files. The changes include the addition of a new manifest filtering mechanism, refactoring how manifest counters are managed, and updating snapshot actions to utilize these improvements.
Manifest Filtering and Snapshot Management Enhancements:
manifest_filter
to handle filtering of manifest files, and exposed its functionality for use in transactions. (crates/iceberg/src/transaction/mod.rs
)SnapshotProduceAction
to use a shared atomic manifest counter and introduced theManifestFilterManager
for more sophisticated management of delete file manifests. This includes removing direct tracking of removed delete files and delegating that responsibility to the filter manager. (crates/iceberg/src/transaction/snapshot.rs
) [1] [2] [3]crates/iceberg/src/transaction/snapshot.rs
)new_manifest_path
, improving consistency and avoiding naming conflicts. (crates/iceberg/src/transaction/snapshot.rs
) [1] [2]uuid
as a workspace dependency inCargo.toml
to support unique identification for manifest files. (crates/examples/Cargo.toml
)These changes collectively enhance the reliability and maintainability of manifest management in Iceberg transactions, especially around handling delete files and manifest file naming.
Are these changes tested?