Commit 1a6df66
fix: bit_count function to report nullability correctly (#19197)
## Which issue does this PR close?
Closes #19147
Part of #19144 (EPIC: fix nullability report for spark expression)
## Rationale for this change
The `bit_count` UDF was using the default `return_type` implementation
which does not preserve nullability information. This causes:
1. **Incorrect schema inference** - non-nullable integer inputs are
incorrectly marked as producing nullable Int32 outputs
2. **Missed optimization opportunities** - the query optimizer cannot
apply nullability-based optimizations when metadata is incorrect
3. **Inconsistent behavior** - other similar functions preserve
nullability, leading to unexpected differences
The `bit_count` function counts the number of set bits (ones) in the
binary representation of a number and returns an Int32. The operation
itself doesn't introduce nullability - if the input is non-nullable, the
output will always be non-nullable. Therefore, the output nullability
should match the input.
## What changes are included in this PR?
1. **Implemented `return_field_from_args`**: Creates a field with Int32
type and the same nullability as the input field
2. **Updated `return_type`**: Now returns an error directing users to
use `return_field_from_args` instead (following DataFusion best
practices)
3. **Added `FieldRef` and `internal_err` imports** to support the new
implementation
4. **Added nullability test**: Verifies that both nullable and
non-nullable inputs are handled correctly
## Are these changes tested?
Yes, this PR includes a new test `test_bit_count_nullability` that
verifies:
- Non-nullable Int32 input produces non-nullable Int32 output
- Nullable Int32 input produces nullable Int32 output
- Data type is correctly set to Int32 in both cases
Test results:
```
running 1 test
test function::bitwise::bit_count::tests::test_bit_count_nullability ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
```
Additionally, all existing `bit_count` tests continue to pass, ensuring
backward compatibility.
## Are there any user-facing changes?
**Yes - Schema metadata improvement:**
Users will now see correct nullability information in the schema:
**Before (Bug):**
- Non-nullable Int32/Int64/Boolean input → nullable Int32 output
(incorrect)
**After (Fixed):**
- Non-nullable Int32/Int64/Boolean input → non-nullable Int32 output
(correct)
- Nullable Int32/Int64/Boolean input → nullable Int32 output (correct)
This is a **bug fix** that corrects schema metadata only - it does not
change the actual computation or introduce any breaking changes to the
API.
**Impact:**
- Query optimizers can now make better decisions based on accurate
nullability information
- Schema validation will be more accurate
- No changes to function behavior or output values
---
## Code Changes Summary
### Modified File: `datafusion/spark/src/function/bitwise/bit_count.rs`
#### 1. Added Imports
```rust
use arrow::datatypes::{
DataType, FieldRef, Int16Type, Int32Type, Int64Type, Int8Type, UInt16Type, UInt32Type,
UInt64Type, UInt8Type,
};
use datafusion_common::{internal_err, plan_err, Result};
```
#### 2. Updated return_type Method
```rust
fn return_type(&self, _arg_types: &[DataType]) -> Result<DataType> {
internal_err!("return_field_from_args should be used instead")
}
```
#### 3. Added return_field_from_args Implementation
```rust
fn return_field_from_args(
&self,
args: datafusion_expr::ReturnFieldArgs,
) -> Result<FieldRef> {
use arrow::datatypes::Field;
// bit_count returns Int32 with the same nullability as the input
Ok(Arc::new(Field::new(
args.arg_fields[0].name(),
DataType::Int32,
args.arg_fields[0].is_nullable(),
)))
}
```
#### 4. Added Test
```rust
#[test]
fn test_bit_count_nullability() -> Result<()> {
use datafusion_expr::ReturnFieldArgs;
let bit_count = SparkBitCount::new();
// Test with non-nullable Int32 field
let non_nullable_field = Arc::new(Field::new("num", DataType::Int32, false));
let result = bit_count.return_field_from_args(ReturnFieldArgs {
arg_fields: &[Arc::clone(&non_nullable_field)],
scalar_arguments: &[None],
})?;
// The result should not be nullable (same as input)
assert!(!result.is_nullable());
assert_eq!(result.data_type(), &DataType::Int32);
// Test with nullable Int32 field
let nullable_field = Arc::new(Field::new("num", DataType::Int32, true));
let result = bit_count.return_field_from_args(ReturnFieldArgs {
arg_fields: &[Arc::clone(&nullable_field)],
scalar_arguments: &[None],
})?;
// The result should be nullable (same as input)
assert!(result.is_nullable());
assert_eq!(result.data_type(), &DataType::Int32);
Ok(())
}
```
---
## Verification Steps
1. **Run the new test:**
```bash
cargo test -p datafusion-spark test_bit_count_nullability --lib
```
2. **Run all bit_count tests:**
```bash
cargo test -p datafusion-spark bit_count --lib
```
3. **Run clippy checks:**
```bash
cargo clippy -p datafusion-spark --all-targets -- -D warnings
```
All checks pass successfully!
---
## Related Issues
- Closes: #19147
- Part of EPIC: #19144 (fix nullability report for spark expression)
- Similar fixes:
- #19145 (shuffle function nullability)
- #19146 (bitmap_count function nullability)
---
Co-authored-by: Oleks V <[email protected]>1 parent 4fb36b2 commit 1a6df66
1 file changed
+51
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
24 | | - | |
| 23 | + | |
| 24 | + | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
| 27 | + | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| |||
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
80 | | - | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
81 | 94 | | |
82 | 95 | | |
83 | 96 | | |
| |||
163 | 176 | | |
164 | 177 | | |
165 | 178 | | |
166 | | - | |
| 179 | + | |
167 | 180 | | |
168 | 181 | | |
169 | 182 | | |
| |||
336 | 349 | | |
337 | 350 | | |
338 | 351 | | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
339 | 385 | | |
0 commit comments