Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 39 additions & 10 deletions docs/reference/esql/processing-commands/stats.asciidoc
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
[discrete]
[[esql-stats-by]]
=== `STATS ... BY`
=== `STATS`

The `STATS ... BY` processing command groups rows according to a common value
The `STATS` processing command groups rows according to a common value
and calculates one or more aggregated values over the grouped rows.

**Syntax**

[source,esql]
----
STATS [column1 =] expression1[, ..., [columnN =] expressionN]
[BY grouping_expression1[, ..., grouping_expressionN]]
STATS [column1 =] expression1 [WHERE boolean_expression1][,
...,
[columnN =] expressionN [WHERE boolean_expressionN]]
[BY grouping_expression1[, ..., grouping_expressionN]]
----

*Parameters*
Expand All @@ -28,14 +30,18 @@ An expression that computes an aggregated value.
An expression that outputs the values to group by.
If its name coincides with one of the computed columns, that column will be ignored.

`boolean_expressionX`::
The condition that must be met for a row to be included in the evaluation of `expressionX`.

NOTE: Individual `null` values are skipped when computing aggregations.

*Description*

The `STATS ... BY` processing command groups rows according to a common value
and calculate one or more aggregated values over the grouped rows. If `BY` is
omitted, the output table contains exactly one row with the aggregations applied
over the entire dataset.
The `STATS` processing command groups rows according to a common value
and calculates one or more aggregated values over the grouped rows. For the
calculation of each aggregated value, the rows in a group can be filtered with
`WHERE`. If `BY` is omitted, the output table contains exactly one row with
the aggregations applied over the entire dataset.

The following <<esql-agg-functions,aggregation functions>> are supported:

Expand Down Expand Up @@ -90,6 +96,29 @@ include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues]
include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues-result]
|===

To filter the rows that go into an aggregation, use the `WHERE` clause:

[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=aggFiltering]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=aggFiltering-result]
|===

The aggregations can be mixed, with and without a filter and grouping is
optional as well:

[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=aggFilteringNoGroup]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=aggFilteringNoGroup-result]
|===

[[esql-stats-mv-group]]
If the grouping key is multivalued then the input row is in all groups:

Expand All @@ -109,7 +138,7 @@ It's also possible to group by multiple values:
include::{esql-specs}/stats.csv-spec[tag=statsGroupByMultipleValues]
----

If the all grouping keys are multivalued then the input row is in all groups:
If all the grouping keys are multivalued then the input row is in all groups:

[source.merge.styled,esql]
----
Expand All @@ -121,7 +150,7 @@ include::{esql-specs}/stats.csv-spec[tag=multi-mv-group-result]
|===

Both the aggregating functions and the grouping expressions accept other
functions. This is useful for using `STATS...BY` on multivalue columns.
functions. This is useful for using `STATS` on multivalue columns.
For example, to calculate the average salary change, you can use `MV_AVG` to
first average the multiple values per employee, and use the result with the
`AVG` function:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2348,6 +2348,42 @@ v:integer | job_positions:keyword
10094 | Accountant
;

docsStatsWithSimpleFiltering
required_capability: per_agg_filtering
// tag::aggFiltering[]
FROM employees
| STATS avg50s = AVG(salary)::LONG WHERE birth_date < "1960-01-01",
avg60s = AVG(salary)::LONG WHERE birth_date >= "1960-01-01"
BY gender
| SORT gender
// end::aggFiltering[]
| WHERE gender IS NOT NULL
;

// tag::aggFiltering-result[]
avg50s:long |avg60s:long |gender:keyword
55462 |46637 |F
48279 |44879 |M
// end::aggFiltering-result[]
;

docsStatsWithFilteringNoGroups
required_capability: per_agg_filtering
// tag::aggFilteringNoGroup[]
FROM employees
| EVAL Ks = salary / 1000 // thousands
| STATS under_40K = COUNT(*) WHERE Ks < 40,
inbetween = COUNT(*) WHERE 40 <= Ks AND Ks < 60,
over_60K = COUNT(*) WHERE 60 <= Ks,
total = COUNT(*)
// end::aggFilteringNoGroup[]
;

// tag::aggFilteringNoGroup-result[]
under_40K:long |inbetween:long |over_60K:long |total:long
36 |39 |25 |100
// end::aggFilteringNoGroup-result[]
;

statsWithFiltering
required_capability: per_agg_filtering
Expand Down