Skip to content

Commit 8ef884d

Browse files
authored
Change default of AggregateUDFImpl::supports_null_handling_clause to false (#18441)
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Part of #9924 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Setting a default of `true` is too permissive; we can see it allows specifying it on `median` for example even though that function doesn't take the config into account. It seems only `array_agg`, `first_value` and `last_value` actually respect the config this setting handles so it makes more sense to make this false by default. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Change default of `AggregateUDFImpl::supports_null_handling_clause` to `false` (from `true`). Adjust `array_agg`, `first_value` and `last_value` to implement it as `true`. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Existing tests, also added a negative case for `median` (expect SQL parsing to fail if providing null handling clause). ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> Behaviour change as default of a trait method is changing. Added section to upgrade guide. <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
1 parent 35b5363 commit 8ef884d

File tree

8 files changed

+52
-18
lines changed

8 files changed

+52
-18
lines changed

datafusion/expr/src/udaf.rs

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -740,10 +740,14 @@ pub trait AggregateUDFImpl: Debug + DynEq + DynHash + Send + Sync {
740740
ScalarValue::try_from(data_type)
741741
}
742742

743-
/// If this function supports `[IGNORE NULLS | RESPECT NULLS]` clause, return true
744-
/// If the function does not, return false
743+
/// If this function supports `[IGNORE NULLS | RESPECT NULLS]` SQL clause,
744+
/// return `true`. Otherwise, return `false` which will cause an error to be
745+
/// raised during SQL parsing if these clauses are detected for this function.
746+
///
747+
/// Functions which implement this as `true` are expected to handle the resulting
748+
/// null handling config present in [`AccumulatorArgs`], `ignore_nulls`.
745749
fn supports_null_handling_clause(&self) -> bool {
746-
true
750+
false
747751
}
748752

749753
/// If this function supports the `WITHIN GROUP (ORDER BY column [ASC|DESC])`

datafusion/functions-aggregate/src/approx_percentile_cont.rs

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -315,10 +315,6 @@ impl AggregateUDFImpl for ApproxPercentileCont {
315315
Ok(arg_types[0].clone())
316316
}
317317

318-
fn supports_null_handling_clause(&self) -> bool {
319-
false
320-
}
321-
322318
fn supports_within_group_clause(&self) -> bool {
323319
true
324320
}

datafusion/functions-aggregate/src/approx_percentile_cont_with_weight.rs

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -258,10 +258,6 @@ impl AggregateUDFImpl for ApproxPercentileContWithWeight {
258258
self.approx_percentile_cont.state_fields(args)
259259
}
260260

261-
fn supports_null_handling_clause(&self) -> bool {
262-
false
263-
}
264-
265261
fn supports_within_group_clause(&self) -> bool {
266262
true
267263
}

datafusion/functions-aggregate/src/array_agg.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,10 @@ impl AggregateUDFImpl for ArrayAgg {
224224
datafusion_expr::ReversedUDAF::Reversed(array_agg_udaf())
225225
}
226226

227+
fn supports_null_handling_clause(&self) -> bool {
228+
true
229+
}
230+
227231
fn documentation(&self) -> Option<&Documentation> {
228232
self.doc()
229233
}

datafusion/functions-aggregate/src/first_last.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -302,6 +302,10 @@ impl AggregateUDFImpl for FirstValue {
302302
ReversedUDAF::Reversed(last_value_udaf())
303303
}
304304

305+
fn supports_null_handling_clause(&self) -> bool {
306+
true
307+
}
308+
305309
fn documentation(&self) -> Option<&Documentation> {
306310
self.doc()
307311
}
@@ -1127,6 +1131,10 @@ impl AggregateUDFImpl for LastValue {
11271131
ReversedUDAF::Reversed(first_value_udaf())
11281132
}
11291133

1134+
fn supports_null_handling_clause(&self) -> bool {
1135+
true
1136+
}
1137+
11301138
fn documentation(&self) -> Option<&Documentation> {
11311139
self.doc()
11321140
}

datafusion/functions-aggregate/src/percentile_cont.rs

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -358,10 +358,6 @@ impl AggregateUDFImpl for PercentileCont {
358358
}
359359
}
360360

361-
fn supports_null_handling_clause(&self) -> bool {
362-
false
363-
}
364-
365361
fn supports_within_group_clause(&self) -> bool {
366362
true
367363
}

datafusion/sqllogictest/test_files/aggregate.slt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -844,6 +844,13 @@ SELECT approx_median(distinct col_i8) FROM median_table
844844
statement error DataFusion error: This feature is not implemented: APPROX_MEDIAN\(DISTINCT\) aggregations are not available
845845
SELECT approx_median(col_i8), approx_median(distinct col_i8) FROM median_table
846846

847+
# null handling clauses not supported
848+
query error DataFusion error: Error during planning: \[IGNORE \| RESPECT\] NULLS are not permitted for median
849+
SELECT median(c2) IGNORE NULLS FROM aggregate_test_100
850+
851+
query error DataFusion error: Error during planning: \[IGNORE \| RESPECT\] NULLS are not permitted for median
852+
SELECT median(c2) RESPECT NULLS FROM aggregate_test_100
853+
847854
# median_i16
848855
query I
849856
SELECT median(col_i16) FROM median_table

docs/source/library-user-guide/upgrading.md

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,34 @@
1919

2020
# Upgrade Guides
2121

22-
## DataFusion `51.0.0`
22+
## DataFusion `52.0.0`
23+
24+
**Note:** DataFusion `52.0.0` has not been released yet. The information provided in this section pertains to features and changes that have already been merged to the main branch and are awaiting release in this version.
25+
26+
You can see the current [status of the `52.0.0` release here](https://github.com/apache/datafusion/issues/18566)
27+
28+
### `AggregateUDFImpl::supports_null_handling_clause` now defaults to `false`
29+
30+
This method specifies whether an aggregate function allows `IGNORE NULLS`/`RESPECT NULLS`
31+
during SQL parsing, with the implication it respects these configs during computation.
32+
33+
Most DataFusion aggregate functions silently ignored this syntax in prior versions
34+
as they did not make use of it and it was permitted by default. We change this so
35+
only the few functions which do respect this clause (e.g. `array_agg`, `first_value`,
36+
`last_value`) need to implement it.
2337

24-
**Note:** DataFusion `51.0.0` has not been released yet. The information provided in this section pertains to features and changes that have already been merged to the main branch and are awaiting release in this version.
38+
Custom user defined aggregate functions will also error if this syntax is used,
39+
unless they explicitly declare support by overriding the method.
2540

26-
You can see the current [status of the `51.0.0`release here](https://github.com/apache/datafusion/issues/17558)
41+
For example, SQL parsing will now fail for queries such as this:
42+
43+
```sql
44+
SELECT median(c1) IGNORE NULLS FROM table
45+
```
46+
47+
Instead of silently succeeding.
48+
49+
## DataFusion `51.0.0`
2750

2851
### `arrow` / `parquet` updated to 57.0.0
2952

0 commit comments

Comments
 (0)