Commit 45c82df

authored

fix(reader): fix position delete bugs with row group skipping (#1806)

## Which issue does this PR close? Partially address #1749. ## What changes are included in this PR? This PR fixes two related correctness bugs in `ArrowReader::build_deletes_row_selection()` where position deletes targeting rows in skipped or skipped-to row groups were not being applied correctly. ### Background: How These Bugs Were Discovered While running Apache Spark + Apache Iceberg integration tests through DataFusion Comet, we discovered that the following tests were failing or hanging: - org.apache.iceberg.spark.extensions.TestMergeOnReadMerge - org.apache.iceberg.spark.extensions.TestMergeOnReadDelete - org.apache.iceberg.spark.extensions.TestMergeOnReadUpdate Investigation revealed that recent work to support Iceberg's file splitting feature (via `filter_row_groups_by_byte_range()`) exposed latent bugs in the position delete logic. While the byte range filtering code itself is correct, it exercises code paths that were previously untested, revealing these pre-existing issues. #### Bug 1: Missing base index increment when skipping row groups **The Issue:** When processing a Parquet file with multiple row groups, if a position delete targets a row in a later row group, the function would skip row groups without deletes but fail to increment `current_row_group_base_idx`. This caused the row index tracking to become desynchronized. **Example scenario:** - File with 2 row groups: rows 0-99 (group 0) and rows 100-199 (group 1) - Position delete targets row 199 (last row in group 1) - When processing group 0: delete (199) is beyond the group's range, so code hits `continue` at lines 469-471 - BUG: `current_row_group_base_idx` is NOT incremented, stays at 0 - When processing group 1: code thinks rows start at 0 instead of 100 - Delete at position 199 is never applied (thinks file only has rows 0-99) **The Fix:** Add `current_row_group_base_idx += row_group_num_rows` before the two `continue` statements at lines ~470 and ~481. This ensures row index tracking stays synchronized when skipping row groups. #### Bug 2: Stale cached delete index when skipping unselected row groups **The Issue:** When row group selection is active (e.g., via byte range filtering for file splits) and an unselected row group is skipped, the cached `next_deleted_row_idx_opt` variable can become stale, leading to either lost deletes or infinite loops depending on the scenario. The function maintains a cached value (`next_deleted_row_idx_opt`) containing the next delete to apply. When skipping unselected row groups, it calls `delete_vector_iter.advance_to(next_row_group_base_idx)` to position the iterator, but this doesn't automatically update the cached variable. **Two problematic scenarios:** 1. Stale cache causes infinite loop (the bug we hit): - File with 2 row groups: rows 0-99 (group 0) and rows 100-199 (group 1) - Position delete at row 0 (in group 0) - Row group selection: read ONLY group 1 - Initial state: `next_deleted_row_idx_opt = Some(0)` (cached) - Skip group 0: `advance_to(100)` positions iterator past delete at 0 - BUG: cached value still `Some(0)` - STALE! - Process group 1: loop condition `0 < 200` is `true`, but `current_idx (100) != next_deleted_row_idx (0)`, so neither branch executes could result in infinite loop 2. Unconditionally calling `next()` loses deletes: - File with 2 row groups: rows 0-99 (group 0) and rows 100-199 (group 1) - Position delete at row 199 (in group 1) - Row group selection: read ONLY group 1 - Initial state: `next_deleted_row_idx_opt = Some(199)` (cached, already correct!) - Skip group 0: `advance_to(100)` - iterator already positioned correctly - If we call `next()`: BUG - consumes delete at 199, advancing past it - Process group 1: iterator exhausted, delete is lost **The Fix:** - If `cached value < next_row_group_base_idx` (stale), update it, thus avoiding infinite loop - If `cached value >= next_row_group_base_idx` (still valid), keep it, thus preserving delete ## Are these changes tested? Yes. This PR adds two comprehensive unit tests in reader.rs: 1. `test_position_delete_across_multiple_row_groups` - Tests bug 1 (missing base index increment) 2. `test_position_delete_with_row_group_selection` - Tests bug 2 scenario where delete is in selected group 3. `test_position_delete_in_skipped_row_group` - Tests bug 2 scenario where delete is in skipped group (would hang without fix) Additionally, these fixes resolve failures in Iceberg Java's spark-extension tests when running with DataFusion Comet’s PR apache/datafusion-comet#2528: - org.apache.iceberg.spark.extensions.TestMergeOnReadMerge - org.apache.iceberg.spark.extensions.TestMergeOnReadDelete - org.apache.iceberg.spark.extensions.TestMergeOnReadUpdate

1 parent 833739e commit 45c82dfCopy full SHA for 45c82df

1 file changed

+612

-5

lines changed

crates/iceberg/src/arrow
- reader.rs

1 file changed

+612

-5

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 45c82df

1 file changed

1 file changed

File tree

1 file changed

1 file changed

0 commit comments