-
Notifications
You must be signed in to change notification settings - Fork 344
Commit 45c82df
authored
fix(reader): fix position delete bugs with row group skipping (#1806)
## Which issue does this PR close?
Partially address #1749.
## What changes are included in this PR?
This PR fixes two related correctness bugs in
`ArrowReader::build_deletes_row_selection()` where position deletes
targeting rows in skipped or skipped-to row groups were not being
applied correctly.
### Background: How These Bugs Were Discovered
While running Apache Spark + Apache Iceberg integration tests through
DataFusion Comet, we discovered that the following tests were failing or
hanging:
- org.apache.iceberg.spark.extensions.TestMergeOnReadMerge
- org.apache.iceberg.spark.extensions.TestMergeOnReadDelete
- org.apache.iceberg.spark.extensions.TestMergeOnReadUpdate
Investigation revealed that recent work to support Iceberg's file
splitting feature (via `filter_row_groups_by_byte_range()`) exposed
latent bugs in the position delete logic. While the byte range filtering
code itself is correct, it exercises code paths that were previously
untested, revealing these pre-existing issues.
#### Bug 1: Missing base index increment when skipping row groups
**The Issue:**
When processing a Parquet file with multiple row groups, if a position
delete targets a row in a later row group, the function would skip row
groups without deletes but fail to increment
`current_row_group_base_idx`. This caused the row index tracking to
become desynchronized.
**Example scenario:**
- File with 2 row groups: rows 0-99 (group 0) and rows 100-199 (group 1)
- Position delete targets row 199 (last row in group 1)
- When processing group 0: delete (199) is beyond the group's range, so
code hits `continue` at lines 469-471
- BUG: `current_row_group_base_idx` is NOT incremented, stays at 0
- When processing group 1: code thinks rows start at 0 instead of 100
- Delete at position 199 is never applied (thinks file only has rows
0-99)
**The Fix:**
Add `current_row_group_base_idx += row_group_num_rows` before the two
`continue` statements at lines ~470 and ~481. This ensures row index
tracking stays synchronized when skipping row groups.
#### Bug 2: Stale cached delete index when skipping unselected row
groups
**The Issue:**
When row group selection is active (e.g., via byte range filtering for
file splits) and an unselected row group is skipped, the cached
`next_deleted_row_idx_opt` variable can become stale, leading to either
lost deletes or infinite loops depending on the scenario.
The function maintains a cached value (`next_deleted_row_idx_opt`)
containing the next delete to apply. When skipping unselected row
groups, it calls
`delete_vector_iter.advance_to(next_row_group_base_idx)` to position the
iterator, but this doesn't automatically update the cached variable.
**Two problematic scenarios:**
1. Stale cache causes infinite loop (the bug we hit):
- File with 2 row groups: rows 0-99 (group 0) and rows 100-199 (group 1)
- Position delete at row 0 (in group 0)
- Row group selection: read ONLY group 1
- Initial state: `next_deleted_row_idx_opt = Some(0)` (cached)
- Skip group 0: `advance_to(100)` positions iterator past delete at 0
- BUG: cached value still `Some(0)` - STALE!
- Process group 1: loop condition `0 < 200` is `true`, but `current_idx
(100) != next_deleted_row_idx (0)`, so neither branch executes could
result in infinite loop
2. Unconditionally calling `next()` loses deletes:
- File with 2 row groups: rows 0-99 (group 0) and rows 100-199 (group 1)
- Position delete at row 199 (in group 1)
- Row group selection: read ONLY group 1
- Initial state: `next_deleted_row_idx_opt = Some(199)` (cached, already
correct!)
- Skip group 0: `advance_to(100)` - iterator already positioned
correctly
- If we call `next()`: BUG - consumes delete at 199, advancing past it
- Process group 1: iterator exhausted, delete is lost
**The Fix:**
- If `cached value < next_row_group_base_idx` (stale), update it, thus
avoiding infinite loop
- If `cached value >= next_row_group_base_idx` (still valid), keep it,
thus preserving delete
## Are these changes tested?
Yes. This PR adds two comprehensive unit tests in reader.rs:
1. `test_position_delete_across_multiple_row_groups` - Tests bug 1
(missing base index increment)
2. `test_position_delete_with_row_group_selection` - Tests bug 2
scenario where delete is in selected group
3. `test_position_delete_in_skipped_row_group` - Tests bug 2 scenario
where delete is in skipped group (would hang without fix)
Additionally, these fixes resolve failures in Iceberg Java's
spark-extension tests when running with DataFusion Comet’s PR
apache/datafusion-comet#2528:
- org.apache.iceberg.spark.extensions.TestMergeOnReadMerge
- org.apache.iceberg.spark.extensions.TestMergeOnReadDelete
- org.apache.iceberg.spark.extensions.TestMergeOnReadUpdate1 parent 833739e commit 45c82dfCopy full SHA for 45c82df
File tree
Expand file treeCollapse file tree
1 file changed
+612
-5
lines changedOpen diff view settings
Filter options
- crates/iceberg/src/arrow
Expand file treeCollapse file tree
1 file changed
+612
-5
lines changedOpen diff view settings
0 commit comments