[SPARK-55702][SQL] Support filter predicate in window aggregate functions#54501
[SPARK-55702][SQL] Support filter predicate in window aggregate functions#54501cloud-fan wants to merge 1 commit intoapache:masterfrom
Conversation
c1874cf to
d4b1eaa
Compare
d4b1eaa to
e1ebfd8
Compare
| if (filters.length == functions.length) filters | ||
| else Array.fill(functions.length)(None) |
There was a problem hiding this comment.
Should we add an assert like assert(filters.isEmpty || filters.length == functions.length)?
There was a problem hiding this comment.
I don't see why we need the Array.empty default value for filters and I don't see how the sizes could differ so why not just assert(filters.length == functions.length) or change the contract to functionsAndFilters: Seq[(Expression, Option[Expression])].
| filterOpt match { | ||
| case Some(filter) => | ||
| updateExpressions ++= agg.updateExpressions.zip(agg.aggBufferAttributes).map { | ||
| case (updateExpr, attr) => If(filter, updateExpr, attr) |
There was a problem hiding this comment.
Does it mean filter will be evaluated multiple times? Maybe common expression evaluation helps.
There was a problem hiding this comment.
It's pretty much the same as interpreted version of HashAggregateExec: AggregationIterator
| var i = 0 | ||
| while (i < numImperatives) { | ||
| imperatives(i).update(buffer, input) | ||
| val shouldUpdate = imperativeFilters(i) match { |
There was a problem hiding this comment.
Looks like there is no common expression evaluation here?
There was a problem hiding this comment.
| first_value(val) FILTER (WHERE cate = 'a') OVER(ORDER BY val_long | ||
| ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS first_a, | ||
| last_value(val) FILTER (WHERE cate = 'a') OVER(ORDER BY val_long | ||
| ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS last_a | ||
| FROM testData ORDER BY val_long, cate; |
There was a problem hiding this comment.
The tests all use either UNBOUNDED PRECEDING AND CURRENT ROW (growing frame) or no-frame PARTITION BY cate (full partition). There's no test for a true sliding window like:
sum(val) FILTER (WHERE val > 1) OVER (ORDER BY val_long ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
| SELECT val, cate, | ||
| sum(val) FILTER (WHERE cate = 'a') OVER(PARTITION BY cate) AS total_sum_filtered | ||
| FROM testData ORDER BY cate, val; | ||
|
|
There was a problem hiding this comment.
No test for RANGE frame?
All new tests use ROW frames. There's no test for:
sum(val) FILTER (WHERE cate = 'a') OVER (ORDER BY val_long RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
Remove the analysis check that rejected FILTER in window aggregates and add filter support to AggregateProcessor in WindowExec so that AggregateExpression.filter is honored during window frame evaluation. For DeclarativeAggregate, the update expressions are wrapped with If(filter, updateExpr, bufferAttr) to skip rows that don't match. For ImperativeAggregate, the filter predicate is evaluated before calling update(). Made-with: Cursor
e1ebfd8 to
f4fe768
Compare
What changes were proposed in this pull request?
This PR adds support for the
FILTER (WHERE ...)clause on aggregate functions used within window expressions. Previously, Spark rejected this with anAnalysisException("Window aggregate function with filter predicate is not supported yet.").The changes are:
Analyzer.scalathat blockedFILTERin window aggregates, and extract filter expressions alongside aggregate function children.AggregateProcessorso thatAggregateExpression.filteris honored during window frame evaluation:DeclarativeAggregate: update expressions are wrapped withIf(filter, updateExpr, bufferAttr)to conditionally skip rows.ImperativeAggregate: the filter predicate is evaluated before callingupdate().WindowEvaluatorFactoryBasetoAggregateProcessor.Why are the changes needed?
The SQL standard allows
FILTERon aggregate functions in window contexts. Other databases (PostgreSQL, etc.) support this. Spark already supportsFILTERfor regular (non-window) aggregates but rejected it in window contexts.Does this PR introduce any user-facing change?
Yes. Window aggregate expressions with
FILTERnow execute instead of throwing anAnalysisException. For example:How was this patch tested?
Added 4 SQL test cases in
window.sqlcovering:first_value/last_valuewith filter (verifying no interference with NULL handling)The existing test case (
count(val) FILTER (WHERE val > 1) OVER(...)) now produces correct results instead of an error.Was this patch authored or co-authored using generative AI tooling?
Yes.
Made with Cursor