Commit f734ec5
Fix Subtraction overflow in
## Which issue does this PR close?
- Closes apache#20779.
## Rationale for this change
In `max_distinct_count` (inside
`datafusion/physical-plan/src/joins/utils.rs`), the
`Precision::Exact` branch computes the number of non-null rows by doing:
```rust
let count = count - stats.null_count.get_value().unwrap_or(&0);
```
Before apache#20228 this subtraction was always safe because `num_rows` was
never smaller
than `null_count`. But apache#20228 added `fetch` (limit push-down) support to
`HashJoinExec`, and when a limit is applied, `partition_statistics()`
caps
`num_rows` to `Exact(fetch_value)` without also capping the per-column
`null_count`. This means `null_count` can legally exceed `num_rows`,
causing a
panic with *"attempt to subtract with overflow"*.
## What changes are included in this PR?
- **Bug fix** in `max_distinct_count` (`utils.rs` ~line 725): replaced
the bare
subtraction with a saturating subtraction so that when `null_count`
exceeds
`num_rows` the result is clamped to `0` instead of panicking.
```rust
// Before
let count = count - stats.null_count.get_value().unwrap_or(&0);
// After
let count =
count.saturating_sub(*stats.null_count.get_value().unwrap_or(&0));
```
- **Regression test** added at the bottom of the `mod tests` block in
the same
file. The test deliberately constructs a scenario where `null_count (5)
>
num_rows (2)` and asserts that `max_distinct_count` returns `Exact(0)`
without
panicking.
## Are these changes tested?
Yes. A new unit test
`test_max_distinct_count_no_overflow_when_null_count_exceeds_num_rows`
is added
directly in `datafusion/physical-plan/src/joins/utils.rs`. It covers the
exact
edge-case from the bug report (null_count > num_rows after a fetch/limit
push-down) and would have caught the panic before the fix.
## Are there any user-facing changes?
No user-facing or API changes. This is a purely internal arithmetic fix
in the
statistics estimation logic. Queries that previously panicked when a
limit was
pushed down into a `HashJoinExec` will now complete successfully.
---------
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>max_distinct_count when hash join has a pushed-down limit (apache#20799)1 parent 8f721a6 commit f734ec5
2 files changed
+42
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
722 | 722 | | |
723 | 723 | | |
724 | 724 | | |
725 | | - | |
| 725 | + | |
| 726 | + | |
726 | 727 | | |
727 | | - | |
| 728 | + | |
728 | 729 | | |
729 | | - | |
| 730 | + | |
730 | 731 | | |
731 | 732 | | |
732 | 733 | | |
| |||
2939 | 2940 | | |
2940 | 2941 | | |
2941 | 2942 | | |
| 2943 | + | |
| 2944 | + | |
| 2945 | + | |
| 2946 | + | |
| 2947 | + | |
| 2948 | + | |
| 2949 | + | |
| 2950 | + | |
| 2951 | + | |
| 2952 | + | |
| 2953 | + | |
| 2954 | + | |
| 2955 | + | |
| 2956 | + | |
| 2957 | + | |
2942 | 2958 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5404 | 5404 | | |
5405 | 5405 | | |
5406 | 5406 | | |
| 5407 | + | |
| 5408 | + | |
| 5409 | + | |
| 5410 | + | |
| 5411 | + | |
| 5412 | + | |
| 5413 | + | |
| 5414 | + | |
| 5415 | + | |
| 5416 | + | |
| 5417 | + | |
| 5418 | + | |
| 5419 | + | |
| 5420 | + | |
| 5421 | + | |
| 5422 | + | |
| 5423 | + | |
| 5424 | + | |
| 5425 | + | |
| 5426 | + | |
| 5427 | + | |
| 5428 | + | |
| 5429 | + | |
0 commit comments