Commit c53a448
authored
Fix panic for
## Which issue does this PR close?
* Closes #18974.
## Rationale for this change
The DataFusion CLI currently panics with an "index out of bounds" error
when executing queries that use `GROUP BY GROUPING SETS(())`, such as:
```sql
SELECT SUM(v1) FROM generate_series(10) AS t1(v1) GROUP BY GROUPING SETS(())
```
This panic originates in the physical aggregation code, which assumes
that an empty list of grouping expressions always corresponds to "no
grouping". That assumption breaks down in the presence of `GROUPING
SETS`, where an empty set is a valid grouping set that should still
produce a result row (and `__grouping_id`) rather than crashing.
This PR fixes the panic by explicitly distinguishing:
* true "no GROUP BY" aggregations, and
* `GROUPING SETS`/`CUBE`/`ROLLUP` plans that may have empty grouping
expressions but still require grouping-set semantics and a valid
`__grouping_id`.
The change restores robustness of the CLI and ensures
standards-compliant behavior for grouping sets with empty sets.
## What changes are included in this PR?
Summary of the main changes:
* **Track grouping-set usage explicitly in `PhysicalGroupBy`:**
* Add a `has_grouping_set: bool` field to `PhysicalGroupBy`.
* Extend `PhysicalGroupBy::new` to accept the `has_grouping_set` flag.
* Add helper methods:
* `has_grouping_set(&self) -> bool` to expose the flag, and
* `is_true_no_grouping(&self) -> bool` to represent the case of
genuinely no grouping (no GROUP BY and no grouping sets).
* **Correct group state construction for empty grouping with grouping
sets:**
* Update `PhysicalGroupBy::from_pre_group` so that it only treats
`expr.is_empty()` as "no groups" when `has_grouping_set` is `false`.
* For `GROUPING SETS(())`, we now build at least one group, avoiding the
previous out-of-bounds access on `groups[0]`.
* **Clarify when `__grouping_id` should be present:**
* Replace the previous `is_single` logic with a clearer distinction
based on `has_grouping_set`.
* `num_output_exprs`, `output_exprs`, `num_group_exprs`, and
`group_schema` now add the `__grouping_id` column only when
`has_grouping_set` is `true`.
* `is_single` is redefined as "simple GROUP BY" (no grouping sets), i.e.
`!self.has_grouping_set`.
* **Integrate the new semantics into `AggregateExec`:**
* Use `group_by.is_true_no_grouping()` instead of
`group_by.expr.is_empty()` when choosing between the specialized
no-grouping aggregation path and grouped aggregation.
* Ensure that `is_unordered_unfiltered_group_by_distinct` only treats
plans as grouped when there are grouping expressions **and** no grouping
sets (`!has_grouping_set`).
* Preserve existing behavior for regular `GROUP BY` while correctly
handling `GROUPING SETS` and related constructs.
* **Support `__grouping_id` with the no-grouping aggregation stream:**
* Extend `AggregateStreamInner` with an optional `grouping_id:
Option<ScalarValue>` field.
* Change `AggregateStream::new` to accept a `grouping_id` argument.
* Introduce `prepend_grouping_id_column` to prepend a `__grouping_id`
column to the finalized accumulator output when needed.
* Wire this up so that no-grouping aggregations can still match a schema
that includes `__grouping_id` in grouping-set scenarios.
* **Planner and execution wiring updates:**
* Update all `PhysicalGroupBy::new` call sites to pass the correct
`has_grouping_set` value:
* `false` for:
* ordinary `GROUP BY` or truly no-grouping aggregates.
* `true` for:
* `GROUPING SETS`,
* `CUBE`, and
* `ROLLUP` physical planning paths.
* Ensure `merge_grouping_set_physical_expr`,
`create_cube_physical_expr`, and `create_rollup_physical_expr` correctly
mark grouping-set plans.
* **Protobuf / physical plan round-trip support:**
* Extend `AggregateExecNode` in `datafusion.proto` with a new `bool
has_grouping_set = 12;` field.
* Update the generated `pbjson` and `prost` code to serialize and
deserialize the new field.
* When constructing `AggregateExec` from protobuf, pass the decoded
`has_grouping_set` into `PhysicalGroupBy::new`.
* When serializing an `AggregateExec` back to protobuf, set
`has_grouping_set` based on `exec.group_expr().has_grouping_set()`.
* Update round-trip physical plan tests to include the new field in
their expectations.
* **Tests and SQL logic coverage:**
* Add sqllogictests for the previously failing cases in `grouping.slt`:
* `SELECT COUNT(*) FROM test GROUP BY GROUPING SETS (());`
* `SELECT SUM(v1) FROM generate_series(10) AS t1(v1) GROUP BY GROUPING
SETS(())` (the original panic case).
* Extend or adjust unit tests in `aggregates`, `physical_planner`,
`filter_pushdown`, and `coop` modules to account for the
`has_grouping_set` flag in `PhysicalGroupBy` and expected debug output.
* Update proto round-trip tests to validate `has_grouping_set` is
preserved.
## Are these changes tested?
Yes.
* New sqllogictests covering `GROUPING SETS(())` for both a regular
table and `generate_series(10)`:
* `grouping.slt` now asserts the expected scalar results (e.g. `2` and
`55`), preventing regressions on this edge case.
* Updated and existing Rust unit tests:
* `physical-plan/src/aggregates` tests updated to include
`has_grouping_set` in `PhysicalGroupBy` expectations.
* Planner and optimizer tests (e.g. `physical_planner.rs`,
`filter_pushdown`) updated to construct `PhysicalGroupBy` with the new
flag.
* Execution tests in `core/tests/execution/coop.rs` updated to reflect
the new constructor and continue to exercise the no-grouping aggregation
path.
* Protobuf round-trip tests extended to verify that `has_grouping_set`
is correctly serialized and deserialized.
These tests collectively ensure that:
* the panic is fixed,
* the aggregation semantics for `GROUPING SETS(())` are correct, and
* existing aggregate behavior remains unchanged for non-grouping-set
queries.
## Are there any user-facing changes?
Yes, but they are bug fixes and behavior clarifications rather than
breaking changes:
* Queries using `GROUP BY GROUPING SETS(())` no longer cause a runtime
panic in the DataFusion CLI.
* Instead, they return the expected single aggregate row (e.g.
`COUNT(*)` or `SUM(v1)`), consistent with SQL semantics.
* For plans using `GROUPING SETS`, `CUBE`, or `ROLLUP`, the internal
`__grouping_id` column is now present consistently whenever grouping
sets are in use, even when the grouping expressions are empty.
* For ordinary `GROUP BY` queries that do not use grouping sets,
behavior is unchanged: no unexpected `__grouping_id` column is added.
No API signatures were changed in a breaking way for downstream users;
the additions are internal flags and protobuf fields to accurately
represent the physical plan.
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated
content has been manually reviewed and tested.GROUPING SETS(()) and handle empty-grouping aggregates (#19252)1 parent be0cf05 commit c53a448
File tree
11 files changed
+107
-24
lines changed- datafusion
- core
- src
- tests
- execution
- physical_optimizer/filter_pushdown
- physical-plan/src/aggregates
- proto
- proto
- src
- generated
- physical_plan
- tests/cases
- sqllogictest/test_files
11 files changed
+107
-24
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1582 | 1582 | | |
1583 | 1583 | | |
1584 | 1584 | | |
1585 | | - | |
| 1585 | + | |
| 1586 | + | |
1586 | 1587 | | |
1587 | 1588 | | |
1588 | 1589 | | |
| |||
1654 | 1655 | | |
1655 | 1656 | | |
1656 | 1657 | | |
| 1658 | + | |
1657 | 1659 | | |
1658 | 1660 | | |
1659 | 1661 | | |
| |||
1696 | 1698 | | |
1697 | 1699 | | |
1698 | 1700 | | |
1699 | | - | |
| 1701 | + | |
1700 | 1702 | | |
1701 | 1703 | | |
1702 | 1704 | | |
| |||
1741 | 1743 | | |
1742 | 1744 | | |
1743 | 1745 | | |
1744 | | - | |
| 1746 | + | |
1745 | 1747 | | |
1746 | 1748 | | |
1747 | 1749 | | |
| |||
2832 | 2834 | | |
2833 | 2835 | | |
2834 | 2836 | | |
| 2837 | + | |
2835 | 2838 | | |
2836 | 2839 | | |
2837 | 2840 | | |
| |||
2942 | 2945 | | |
2943 | 2946 | | |
2944 | 2947 | | |
| 2948 | + | |
2945 | 2949 | | |
2946 | 2950 | | |
2947 | 2951 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
170 | 170 | | |
171 | 171 | | |
172 | 172 | | |
173 | | - | |
| 173 | + | |
174 | 174 | | |
175 | 175 | | |
176 | 176 | | |
| |||
204 | 204 | | |
205 | 205 | | |
206 | 206 | | |
207 | | - | |
| 207 | + | |
208 | 208 | | |
209 | 209 | | |
210 | 210 | | |
| |||
240 | 240 | | |
241 | 241 | | |
242 | 242 | | |
| 243 | + | |
243 | 244 | | |
244 | 245 | | |
245 | 246 | | |
| |||
545 | 546 | | |
546 | 547 | | |
547 | 548 | | |
| 549 | + | |
548 | 550 | | |
549 | 551 | | |
550 | 552 | | |
| |||
676 | 678 | | |
677 | 679 | | |
678 | 680 | | |
679 | | - | |
| 681 | + | |
680 | 682 | | |
681 | 683 | | |
682 | 684 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2545 | 2545 | | |
2546 | 2546 | | |
2547 | 2547 | | |
| 2548 | + | |
2548 | 2549 | | |
2549 | 2550 | | |
2550 | 2551 | | |
| |||
2615 | 2616 | | |
2616 | 2617 | | |
2617 | 2618 | | |
| 2619 | + | |
2618 | 2620 | | |
2619 | 2621 | | |
2620 | 2622 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
177 | 177 | | |
178 | 178 | | |
179 | 179 | | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
180 | 183 | | |
181 | 184 | | |
182 | 185 | | |
| |||
185 | 188 | | |
186 | 189 | | |
187 | 190 | | |
| 191 | + | |
188 | 192 | | |
189 | 193 | | |
190 | 194 | | |
191 | 195 | | |
192 | 196 | | |
| 197 | + | |
193 | 198 | | |
194 | 199 | | |
195 | 200 | | |
| |||
201 | 206 | | |
202 | 207 | | |
203 | 208 | | |
| 209 | + | |
204 | 210 | | |
205 | 211 | | |
206 | 212 | | |
| |||
217 | 223 | | |
218 | 224 | | |
219 | 225 | | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
220 | 231 | | |
221 | 232 | | |
222 | 233 | | |
| |||
232 | 243 | | |
233 | 244 | | |
234 | 245 | | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
235 | 251 | | |
236 | 252 | | |
237 | 253 | | |
238 | 254 | | |
239 | 255 | | |
240 | | - | |
| 256 | + | |
| 257 | + | |
241 | 258 | | |
242 | | - | |
| 259 | + | |
243 | 260 | | |
244 | 261 | | |
245 | 262 | | |
| |||
253 | 270 | | |
254 | 271 | | |
255 | 272 | | |
256 | | - | |
| 273 | + | |
257 | 274 | | |
258 | 275 | | |
259 | 276 | | |
| |||
270 | 287 | | |
271 | 288 | | |
272 | 289 | | |
273 | | - | |
| 290 | + | |
274 | 291 | | |
275 | 292 | | |
276 | 293 | | |
| |||
281 | 298 | | |
282 | 299 | | |
283 | 300 | | |
284 | | - | |
285 | | - | |
286 | | - | |
287 | | - | |
288 | | - | |
| 301 | + | |
289 | 302 | | |
290 | 303 | | |
291 | 304 | | |
| |||
308 | 321 | | |
309 | 322 | | |
310 | 323 | | |
311 | | - | |
| 324 | + | |
312 | 325 | | |
313 | 326 | | |
314 | 327 | | |
| |||
344 | 357 | | |
345 | 358 | | |
346 | 359 | | |
347 | | - | |
| 360 | + | |
348 | 361 | | |
349 | 362 | | |
350 | 363 | | |
351 | | - | |
352 | 364 | | |
353 | 365 | | |
354 | 366 | | |
355 | 367 | | |
356 | 368 | | |
357 | 369 | | |
| 370 | + | |
358 | 371 | | |
359 | 372 | | |
360 | 373 | | |
| |||
374 | 387 | | |
375 | 388 | | |
376 | 389 | | |
| 390 | + | |
377 | 391 | | |
378 | 392 | | |
379 | 393 | | |
| |||
723 | 737 | | |
724 | 738 | | |
725 | 739 | | |
726 | | - | |
727 | | - | |
| 740 | + | |
728 | 741 | | |
729 | 742 | | |
730 | 743 | | |
| |||
757 | 770 | | |
758 | 771 | | |
759 | 772 | | |
760 | | - | |
| 773 | + | |
761 | 774 | | |
762 | 775 | | |
763 | 776 | | |
| |||
1954 | 1967 | | |
1955 | 1968 | | |
1956 | 1969 | | |
| 1970 | + | |
1957 | 1971 | | |
1958 | 1972 | | |
1959 | 1973 | | |
| |||
2103 | 2117 | | |
2104 | 2118 | | |
2105 | 2119 | | |
| 2120 | + | |
2106 | 2121 | | |
2107 | 2122 | | |
2108 | 2123 | | |
| |||
2448 | 2463 | | |
2449 | 2464 | | |
2450 | 2465 | | |
| 2466 | + | |
2451 | 2467 | | |
2452 | 2468 | | |
2453 | 2469 | | |
| |||
2892 | 2908 | | |
2893 | 2909 | | |
2894 | 2910 | | |
| 2911 | + | |
2895 | 2912 | | |
2896 | 2913 | | |
2897 | 2914 | | |
| |||
3251 | 3268 | | |
3252 | 3269 | | |
3253 | 3270 | | |
| 3271 | + | |
3254 | 3272 | | |
3255 | 3273 | | |
3256 | 3274 | | |
| |||
3302 | 3320 | | |
3303 | 3321 | | |
3304 | 3322 | | |
| 3323 | + | |
3305 | 3324 | | |
3306 | 3325 | | |
3307 | 3326 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
251 | 251 | | |
252 | 252 | | |
253 | 253 | | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
254 | 271 | | |
255 | 272 | | |
256 | 273 | | |
| |||
350 | 367 | | |
351 | 368 | | |
352 | 369 | | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
353 | 373 | | |
354 | 374 | | |
355 | 375 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1223 | 1223 | | |
1224 | 1224 | | |
1225 | 1225 | | |
| 1226 | + | |
1226 | 1227 | | |
1227 | 1228 | | |
1228 | 1229 | | |
| |||
0 commit comments