Deprecate AggregateUDFImpl::is_nullable in favor of return_field nullability inference #19688

GaneshPatil7517 · 2026-01-07T19:51:17Z

Which issue does this PR close?

Closes #19511
Related to #18882

Rationale for this change

Currently, AggregateUDFImpl::is_nullable() returns true by default for all UDAFs, regardless of input characteristics. This is not ideal because:

The same nullability information is already encoded in return_field()
Most aggregate functions should only be nullable if their inputs are nullable (e.g., MIN, MAX, SUM)
This pattern doesn't align with scalar UDFs, which already use return_field_from_args() for nullability

What changes are included in this PR?

Core Changes

Deprecated is_nullable() on AggregateUDFImpl trait with migration guidance
Updated udaf_default_return_field() to compute nullability from input fields:
- Output is nullable if ANY input field is nullable
- Output is non-nullable only if ALL inputs are non-nullable

Tests

Added 4 new tests validating nullability inference:

test_return_field_nullability_from_nullable_input
test_return_field_nullability_from_non_nullable_input
test_return_field_nullability_with_mixed_inputs
test_return_field_preserves_return_type

Documentation

New docs/source/library-user-guide/functions/udf-nullability.md with migration guide and examples
Updated adding-udfs.md with reference to nullability documentation

Are these changes tested?

Yes. All existing tests pass, plus 4 new tests specifically for nullability behavior.

Are there any user-facing changes?

Deprecation warning: Users implementing is_nullable() will see a deprecation warning directing them to use return_field() instead.

Behavioral change: Default nullability now depends on input field nullability rather than always returning true. Functions like COUNT that need to always return non-nullable should override return_field().

This is a potentially breaking change for users who rely on the previous behavior of always-nullable outputs, but the new behavior is more correct and aligns with scalar UDF patterns.

This commit fixes issue apache#19612 where accumulators that don't implement retract_batch exhibit buggy behavior in window frame queries. When aggregate functions are used with window frames like `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`, DataFusion uses PlainAggregateWindowExpr which calls evaluate() multiple times on the same accumulator instance. Accumulators that use std::mem::take() in their evaluate() method consume their internal state, causing incorrect results on subsequent calls. 1. **percentile_cont**: Modified evaluate() to use mutable reference instead of consuming the Vec. Added retract_batch() support for both PercentileContAccumulator and DistinctPercentileContAccumulator. 2. **string_agg**: Changed SimpleStringAggAccumulator::evaluate() to clone the accumulated string instead of taking it. - datafusion/functions-aggregate/src/percentile_cont.rs: - Changed calculate_percentile() to take &mut [T::Native] instead of Vec<T::Native> - Updated PercentileContAccumulator::evaluate() to pass reference - Updated DistinctPercentileContAccumulator::evaluate() to clone values - Added retract_batch() implementation using HashMap for efficient removal - Updated PercentileContGroupsAccumulator::evaluate() for consistency - datafusion/functions-aggregate/src/string_agg.rs: - Changed evaluate() to use clone() instead of std::mem::take() - datafusion/sqllogictest/test_files/aggregate.slt: - Added test cases for percentile_cont with window frames - Added test comparing median() vs percentile_cont(0.5) behavior - Added test for string_agg cumulative window frame - docs/source/library-user-guide/functions/adding-udfs.md: - Added documentation about window-compatible accumulators - Explained evaluate() state preservation requirements - Documented retract_batch() implementation guidance Closes apache#19612

…docstring

…ort)

…ability inference This change improves how nullability is computed for aggregate UDF outputs by making it depend on the nullability of input fields, aligning with the pattern used for scalar UDFs. Changes: - Mark is_nullable() method as deprecated in AggregateUDFImpl trait - Update udaf_default_return_field() to compute output nullability from input fields: * Output is nullable if ANY input field is nullable * Output is non-nullable only if ALL input fields are non-nullable - Add deprecation migration guide in is_nullable() documentation - Add #[allow(deprecated)] to wrapper method calls in AggregateUDF and AliasedAggregateUDFImpl Testing: - Add 4 new tests validating nullability inference from input fields: * test_return_field_nullability_from_nullable_input * test_return_field_nullability_from_non_nullable_input * test_return_field_nullability_with_mixed_inputs * test_return_field_preserves_return_type - All existing tests continue to pass (test_partial_eq, test_partial_ord) - No regressions in aggregate function execution Documentation: - Create new docs/source/library-user-guide/functions/udf-nullability.md * Explains the nullability change and rationale * Provides migration guide for custom UDAF implementations * Includes examples for default and custom nullability behavior * References scalar UDF patterns - Update docs/source/library-user-guide/functions/adding-udfs.md * Add section on nullability of aggregate functions * Link to new comprehensive nullability documentation Fixes: apache#19511 (related to apache#18882)

datafusion/expr/src/udaf.rs

datafusion/functions-aggregate/src/percentile_cont.rs

martin-g · 2026-01-08T12:44:50Z

docs/source/library-user-guide/functions/udf-nullability.md

+## See Also
+
+- [Adding User Defined Functions](adding-udfs.md) - General guide to implementing UDFs
+- [Scalar UDF Nullability](#) - Similar concepts for scalar UDFs (which already use `return_field_from_args()`)


A link target is missing here

martin-g · 2026-01-08T12:51:37Z

datafusion/functions-aggregate/src/percentile_cont.rs

+
+        let arr = values[0].as_primitive::<T>();
+        for value in arr.iter().flatten() {
+            self.distinct_values.values.remove(&Hashable(value));


Is there a .slt test for this ?

martin-g · 2026-01-08T12:59:26Z

datafusion/functions-aggregate/src/percentile_cont.rs

+                *to_remove.entry(v).or_default() += 1;
+            }
+        }
+


Return early here is to_remove.is_empty() ?

martin-g · 2026-01-08T13:01:56Z

Functions like COUNT that need to always return non-nullable should override return_field().

Is this planned for a later PR ?

Jefffrey

Honestly I do not have confidence in this PR especially considering the issue it tackles.

It is bleeding in changes from other PRs (#19618)
It claims to close the issue (#19511) but this only addresses aggregate UDFs
It includes details in the PR body like saying COUNT should be fixed but doesn't attempt that in this PR
There's small issues like how it deprecates from version 42 or the liberal use of #[allow(...)] which would be caught if clippy was run

All these lead me to think that proper consideration hasn't been given to this PR so I am not very inclined towards it. I feel a lot of this code is generated by an LLM and hasn't been disclosed, or even tested.

Co-authored-by: Martin Grigorov <[email protected]>

GaneshPatil7517 · 2026-01-09T16:54:43Z

Honestly I do not have confidence in this PR especially considering the issue it tackles.

It is bleeding in changes from other PRs (fix(accumulators): preserve state in evaluate() for window frame queries #19618)

It claims to close the issue (Consider changing nullability of UDFs to depend on inputs by default #19511) but this only addresses aggregate UDFs

It includes details in the PR body like saying COUNT should be fixed but doesn't attempt that in this PR

There's small issues like how it deprecates from version 42 or the liberal use of #[allow(...)] which would be caught if clippy was run

All these lead me to think that proper consideration hasn't been given to this PR so I am not very inclined towards it. I feel a lot of this code is generated by an LLM and hasn't been disclosed, or even tested.

no i was beginner in opensource, and actually what happened i mistakable pushed code of the another issue i did not created branch.. i apologises for this....

Co-authored-by: Martin Grigorov <[email protected]>

GaneshPatil7517 · 2026-01-09T16:59:22Z

illl close this PR and create another with Clean.....

GaneshPatil7517 added 6 commits January 6, 2026 09:54

fix: format code and mark doc examples as ignore for doctest

c5fe87b

Merge branch 'main' into fix/accumulators-retract-batch-19612

86e4e03

address review feedback: remove cast code, consolidate tests, update …

3d4eeee

…docstring

fix: remove sliding window test for string_agg (no retract_batch supp…

3b1e671

…ort)

github-actions bot added documentation Improvements or additions to documentation logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jan 7, 2026

Merge branch 'main' into deprecate-is-nullable-return-field

15c0d12

GaneshPatil7517 mentioned this pull request Jan 7, 2026

Consider changing nullability of UDFs to depend on inputs by default #19511

Open

martin-g reviewed Jan 8, 2026

View reviewed changes

Jefffrey reviewed Jan 9, 2026

View reviewed changes

GaneshPatil7517 and others added 5 commits January 9, 2026 22:00

Update datafusion/expr/src/udaf.rs

1e487fb

Co-authored-by: Martin Grigorov <[email protected]>

Update datafusion/expr/src/udaf.rs

26e2261

Co-authored-by: Martin Grigorov <[email protected]>

Update datafusion/functions-aggregate/src/percentile_cont.rs

fa355e8

Co-authored-by: Martin Grigorov <[email protected]>

Update datafusion/expr/src/udaf.rs

5bfae7a

Co-authored-by: Martin Grigorov <[email protected]>

Update datafusion/expr/src/udaf.rs

5a220b6

Co-authored-by: Martin Grigorov <[email protected]>

GaneshPatil7517 and others added 3 commits January 9, 2026 22:25

Update datafusion/expr/src/udaf.rs

e928c23

Co-authored-by: Martin Grigorov <[email protected]>

Update datafusion/expr/src/udaf.rs

9590099

Co-authored-by: Martin Grigorov <[email protected]>

Update datafusion/expr/src/udaf.rs

f263d91

Co-authored-by: Martin Grigorov <[email protected]>

GaneshPatil7517 closed this Jan 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deprecate AggregateUDFImpl::is_nullable in favor of return_field nullability inference #19688

Deprecate AggregateUDFImpl::is_nullable in favor of return_field nullability inference #19688

GaneshPatil7517 commented Jan 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

martin-g Jan 8, 2026

Uh oh!

martin-g Jan 8, 2026

Uh oh!

martin-g Jan 8, 2026

Uh oh!

martin-g commented Jan 8, 2026

Uh oh!

Jefffrey left a comment

Uh oh!

GaneshPatil7517 commented Jan 9, 2026 •

edited

Loading

Uh oh!

GaneshPatil7517 commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Deprecate AggregateUDFImpl::is_nullable in favor of return_field nullability inference #19688

Deprecate AggregateUDFImpl::is_nullable in favor of return_field nullability inference #19688

Conversation

GaneshPatil7517 commented Jan 7, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Core Changes

Tests

Documentation

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

martin-g Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

martin-g Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

martin-g Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

martin-g commented Jan 8, 2026

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

GaneshPatil7517 commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GaneshPatil7517 commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GaneshPatil7517 commented Jan 9, 2026 •

edited

Loading