Skip to content

Commit 8878d33

Browse files
authored
Add docs for aggs filtering (#116681) (#117333)
Add documentation for aggs filtering (the WHERE in STATS command). Fixes: #115083
1 parent 03fd868 commit 8878d33

File tree

2 files changed

+75
-10
lines changed

2 files changed

+75
-10
lines changed

docs/reference/esql/processing-commands/stats.asciidoc

Lines changed: 39 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,18 @@
11
[discrete]
22
[[esql-stats-by]]
3-
=== `STATS ... BY`
3+
=== `STATS`
44

5-
The `STATS ... BY` processing command groups rows according to a common value
5+
The `STATS` processing command groups rows according to a common value
66
and calculates one or more aggregated values over the grouped rows.
77

88
**Syntax**
99

1010
[source,esql]
1111
----
12-
STATS [column1 =] expression1[, ..., [columnN =] expressionN]
13-
[BY grouping_expression1[, ..., grouping_expressionN]]
12+
STATS [column1 =] expression1 [WHERE boolean_expression1][,
13+
...,
14+
[columnN =] expressionN [WHERE boolean_expressionN]]
15+
[BY grouping_expression1[, ..., grouping_expressionN]]
1416
----
1517

1618
*Parameters*
@@ -28,14 +30,18 @@ An expression that computes an aggregated value.
2830
An expression that outputs the values to group by.
2931
If its name coincides with one of the computed columns, that column will be ignored.
3032

33+
`boolean_expressionX`::
34+
The condition that must be met for a row to be included in the evaluation of `expressionX`.
35+
3136
NOTE: Individual `null` values are skipped when computing aggregations.
3237

3338
*Description*
3439

35-
The `STATS ... BY` processing command groups rows according to a common value
36-
and calculate one or more aggregated values over the grouped rows. If `BY` is
37-
omitted, the output table contains exactly one row with the aggregations applied
38-
over the entire dataset.
40+
The `STATS` processing command groups rows according to a common value
41+
and calculates one or more aggregated values over the grouped rows. For the
42+
calculation of each aggregated value, the rows in a group can be filtered with
43+
`WHERE`. If `BY` is omitted, the output table contains exactly one row with
44+
the aggregations applied over the entire dataset.
3945

4046
The following <<esql-agg-functions,aggregation functions>> are supported:
4147

@@ -90,6 +96,29 @@ include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues]
9096
include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues-result]
9197
|===
9298

99+
To filter the rows that go into an aggregation, use the `WHERE` clause:
100+
101+
[source.merge.styled,esql]
102+
----
103+
include::{esql-specs}/stats.csv-spec[tag=aggFiltering]
104+
----
105+
[%header.monospaced.styled,format=dsv,separator=|]
106+
|===
107+
include::{esql-specs}/stats.csv-spec[tag=aggFiltering-result]
108+
|===
109+
110+
The aggregations can be mixed, with and without a filter and grouping is
111+
optional as well:
112+
113+
[source.merge.styled,esql]
114+
----
115+
include::{esql-specs}/stats.csv-spec[tag=aggFilteringNoGroup]
116+
----
117+
[%header.monospaced.styled,format=dsv,separator=|]
118+
|===
119+
include::{esql-specs}/stats.csv-spec[tag=aggFilteringNoGroup-result]
120+
|===
121+
93122
[[esql-stats-mv-group]]
94123
If the grouping key is multivalued then the input row is in all groups:
95124

@@ -109,7 +138,7 @@ It's also possible to group by multiple values:
109138
include::{esql-specs}/stats.csv-spec[tag=statsGroupByMultipleValues]
110139
----
111140

112-
If the all grouping keys are multivalued then the input row is in all groups:
141+
If all the grouping keys are multivalued then the input row is in all groups:
113142

114143
[source.merge.styled,esql]
115144
----
@@ -121,7 +150,7 @@ include::{esql-specs}/stats.csv-spec[tag=multi-mv-group-result]
121150
|===
122151

123152
Both the aggregating functions and the grouping expressions accept other
124-
functions. This is useful for using `STATS...BY` on multivalue columns.
153+
functions. This is useful for using `STATS` on multivalue columns.
125154
For example, to calculate the average salary change, you can use `MV_AVG` to
126155
first average the multiple values per employee, and use the result with the
127156
`AVG` function:

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2318,6 +2318,42 @@ v:integer | job_positions:keyword
23182318
10094 | Accountant
23192319
;
23202320

2321+
docsStatsWithSimpleFiltering
2322+
required_capability: per_agg_filtering
2323+
// tag::aggFiltering[]
2324+
FROM employees
2325+
| STATS avg50s = AVG(salary)::LONG WHERE birth_date < "1960-01-01",
2326+
avg60s = AVG(salary)::LONG WHERE birth_date >= "1960-01-01"
2327+
BY gender
2328+
| SORT gender
2329+
// end::aggFiltering[]
2330+
| WHERE gender IS NOT NULL
2331+
;
2332+
2333+
// tag::aggFiltering-result[]
2334+
avg50s:long |avg60s:long |gender:keyword
2335+
55462 |46637 |F
2336+
48279 |44879 |M
2337+
// end::aggFiltering-result[]
2338+
;
2339+
2340+
docsStatsWithFilteringNoGroups
2341+
required_capability: per_agg_filtering
2342+
// tag::aggFilteringNoGroup[]
2343+
FROM employees
2344+
| EVAL Ks = salary / 1000 // thousands
2345+
| STATS under_40K = COUNT(*) WHERE Ks < 40,
2346+
inbetween = COUNT(*) WHERE 40 <= Ks AND Ks < 60,
2347+
over_60K = COUNT(*) WHERE 60 <= Ks,
2348+
total = COUNT(*)
2349+
// end::aggFilteringNoGroup[]
2350+
;
2351+
2352+
// tag::aggFilteringNoGroup-result[]
2353+
under_40K:long |inbetween:long |over_60K:long |total:long
2354+
36 |39 |25 |100
2355+
// end::aggFilteringNoGroup-result[]
2356+
;
23212357

23222358
statsWithFiltering
23232359
required_capability: per_agg_filtering

0 commit comments

Comments
 (0)