-
Notifications
You must be signed in to change notification settings - Fork 15
fix: Create file groups for partitions before applying filters #112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks for the PR, I will review it ASAP |
| max_concurrent_closes: self | ||
| .max_concurrent_closes | ||
| .unwrap_or(DEFAULT_MAX_CONCURRENT_CLOSES), | ||
| partition_key: self.partition_key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discovered this triggers a panic deeper in iceberg-rust. The root cause is self.partition_key is set to None. However, the RecordBatchPartitionSplitter will provide a partition_key when build() is invoked. We should use that partition_key instead. By using None, we have invalid partition data. I think the first instance works because the initial writer has the correct partition_key.
This triggered a panic in construct_partition_summaries due to the iterators not having the same length. Long term, iceberg-rust should probably return a better error rather than panic. But perhaps panicking is better than writing corrupted data.
Truncated stack trace:
itertools: .zip_eq() reached end of one iterator before the other
iceberg::spec::manifest::writer::ManifestWriter::construct_partition_summaries
iceberg::spec::manifest::writer::ManifestWriter::write_manifest_file
iceberg::transaction::snapshot::SnapshotProducer::write_added_manifest
iceberg::transaction::rewrite_files::RewriteFilesAction::commit
I wrote a 2nd integration test and verified that it triggers a panic without this modification. It also demonstrates the conditions which trigger the error:
- The table must be partitioned
- At least one partition must have a group of input files which is larger than the target_size, and thus triggers a "roll over" to a new output file
See the related issue: #111
Problem
We have observed compaction run again and again for tables which have not added or removed data, even for days at a time. We would expect the compaction job to produce files that are either larger than our configured max size, or fewer than the min_group_file_count, so that subsequent jobs would not run at all. This is impacting tables which are partitioned.
Solution
The ideal approach would be to update the strategies in
file_selection/strategy.rsto group files by partition using partition information in theFileScanTask. Unfortunately, partition information is not in the forked iceberg-rust (yet). That is a recently added attribute: link to task.rs.Approaches
Strategy.execute(). Compare a FileScanTask's datapath with the path in the mapp.Implemented Option A
This PR implements Option a, using Partition values to group up FileScanTask if the Table has a partition.
Testing
I wrote an initial commit with an integration test that tests the expectation that the
min_group_file_countwould filter out a table where the number of files within each partition is less than themin_group_file_count. Without a fix, it fails, and thus reproduces the issue. With the fix, it passes.Here is an example of the failing test:
Bonus: This change also updates the integ test library to run on Mac/Windows machines.