Commit 16906c1
fix: stack overflow when loading large equality deletes (#1915)
## Which issue does this PR close?
- Closes #.
## What changes are included in this PR?
A stack overflow occurs when processing data files containing a large
number of equality deletes (e.g., > 6000 rows).
This happens because parse_equality_deletes_record_batch_stream
previously constructed the final predicate by linearly calling .and() in
a loop:
```rust
result_predicate = result_predicate.and(row_predicate.not());
```
This resulted in a deeply nested, left-skewed tree structure with a
depth equal to the number of rows (N). When rewrite_not() (which uses a
recursive visitor
pattern) was subsequently called on this structure, or when the
structure was dropped, the call stack limit was exceeded.
Changes
1. Balanced Tree Construction: Refactored the predicate combination
logic. Instead of linear accumulation, row predicates are collected and
combined using a
pairwise combination approach to build a balanced tree. This reduces the
tree depth from O(N) to O(log N).
2. Early Rewrite: rewrite_not() is now called immediately on each
individual row predicate before they are combined. This ensures we are
combining simplified
predicates and avoids traversing a massive unoptimized tree later.
3. Regression Test: Added
test_large_equality_delete_batch_stack_overflow, which processes 20,000
equality delete rows to verify the fix.
## Are these changes tested?
- [x] New regression test
test_large_equality_delete_batch_stack_overflow passed.
- [x] All existing tests in arrow::caching_delete_file_loader passed.
Co-authored-by: Renjie Liu <[email protected]>1 parent 58bdb9f commit 16906c1
1 file changed
+69
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
330 | 330 | | |
331 | 331 | | |
332 | 332 | | |
333 | | - | |
| 333 | + | |
334 | 334 | | |
335 | 335 | | |
336 | 336 | | |
| |||
374 | 374 | | |
375 | 375 | | |
376 | 376 | | |
377 | | - | |
| 377 | + | |
378 | 378 | | |
379 | 379 | | |
380 | | - | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
381 | 400 | | |
382 | 401 | | |
383 | 402 | | |
| |||
912 | 931 | | |
913 | 932 | | |
914 | 933 | | |
| 934 | + | |
| 935 | + | |
| 936 | + | |
| 937 | + | |
| 938 | + | |
| 939 | + | |
| 940 | + | |
| 941 | + | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
| 947 | + | |
| 948 | + | |
| 949 | + | |
| 950 | + | |
| 951 | + | |
| 952 | + | |
| 953 | + | |
| 954 | + | |
| 955 | + | |
| 956 | + | |
| 957 | + | |
| 958 | + | |
| 959 | + | |
| 960 | + | |
| 961 | + | |
| 962 | + | |
| 963 | + | |
| 964 | + | |
| 965 | + | |
| 966 | + | |
| 967 | + | |
| 968 | + | |
| 969 | + | |
| 970 | + | |
| 971 | + | |
| 972 | + | |
| 973 | + | |
| 974 | + | |
| 975 | + | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
| 980 | + | |
915 | 981 | | |
0 commit comments