stats

Description

The stats command calculates the aggregation from the search result.

Syntax

stats [bucket_nullable=bool] <aggregation>... [by-clause]

aggregation: mandatory. An aggregation function.
bucket_nullable: optional. Controls whether the stats command includes null buckets in group-by aggregations. When set to false, the aggregation ignores records where the group-by field is null, resulting in faster performance by excluding null bucket. Default: Determined by plugins.ppl.syntax.legacy.preferred.
- When plugins.ppl.syntax.legacy.preferred=true, bucket_nullable defaults to true
- When plugins.ppl.syntax.legacy.preferred=false, bucket_nullable defaults to false
by-clause: optional. Groups results by specified fields or expressions. Syntax: by [span-expression,] [field,]... Default: If no by-clause is specified, the stats command returns only one row, which is the aggregation over the entire result set.
span-expression: optional, at most one. Splits field into buckets by intervals. Syntax: span(field_expr, interval_expr). The unit of the interval expression is the natural unit by default. If the field is a date/time type field, the aggregation results always ignore null bucket. For example, span(age, 10) creates 10-year age buckets, span(timestamp, 1h) creates hourly buckets.
- Available time units
  - millisecond (ms)
  - second (s)
  - minute (m, case sensitive)
  - hour (h)
  - day (d)
  - week (w)
  - month (M, case sensitive)
  - quarter (q)
  - year (y)

Aggregation Functions

The stats command supports the following aggregation functions:

COUNT/C: Count of values
SUM: Sum of numeric values
AVG: Average of numeric values
MAX: Maximum value
MIN: Minimum value
VAR_SAMP: Sample variance
VAR_POP: Population variance
STDDEV_SAMP: Sample standard deviation
STDDEV_POP: Population standard deviation
DISTINCT_COUNT_APPROX: Approximate distinct count
TAKE: List of original values
PERCENTILE/PERCENTILE_APPROX: Percentile calculations
PERC<percent>/P<percent>: Percentile shortcut functions
MEDIAN: 50th percentile
EARLIEST: Earliest value by timestamp
LATEST: Latest value by timestamp
FIRST: First non-null value
LAST: Last non-null value
LIST: Collect all values into array
VALUES: Collect unique values into sorted array

For detailed documentation of each function, see Aggregation Functions.

Limitations

Bucket aggregation result may be approximate in large dataset

In OpenSearch, doc_count values for a terms bucket aggregation may be approximate. As a result, any aggregations (such as sum and avg) on the terms bucket aggregation may also be approximate. For example, the following PPL query (find the top 10 URLs) may return an approximate result if the cardinality of URL is high.

source=hits
| stats bucket_nullable=false count() as c by URL
| sort - c
| head 10

This query is pushed down to a terms bucket aggregation DSL query with "order": { "_count": "desc" }. In OpenSearch, this terms aggregation may throw away some buckets.

Sorting by ascending doc_count may produce inaccurate results

Similar to above PPL query, the following query (find the rare 10 URLs) often produces inaccurate results.

source=hits
| stats bucket_nullable=false count() as c by URL
| sort + c
| head 10

A term that is globally infrequent might not appear as infrequent on every individual shard or might be entirely absent from the least frequent results returned by some shards. Conversely, a term that appears infrequently on one shard might be common on another. In both scenarios, rare terms can be missed during shard-level aggregation, resulting in incorrect overall results.

Example 1: Calculate the count of events

This example shows calculating the count of events in the accounts.

source=accounts
| stats count()

Expected output:

fetched rows / total rows = 1/1
+---------+
| count() |
|---------|
| 4       |
+---------+

Example 2: Calculate the average of a field

This example shows calculating the average age of all the accounts.

source=accounts
| stats avg(age)

Expected output:

fetched rows / total rows = 1/1
+----------+
| avg(age) |
|----------|
| 32.25    |
+----------+

Example 3: Calculate the average of a field by group

This example shows calculating the average age of all the accounts group by gender.

source=accounts
| stats avg(age) by gender

Expected output:

fetched rows / total rows = 2/2
+--------------------+--------+
| avg(age)           | gender |
|--------------------+--------|
| 28.0               | F      |
| 33.666666666666664 | M      |
+--------------------+--------+

Example 4: Calculate the average, sum and count of a field by group

This example shows calculating the average age, sum age and count of events of all the accounts group by gender.

source=accounts
| stats avg(age), sum(age), count() by gender

Expected output:

fetched rows / total rows = 2/2
+--------------------+----------+---------+--------+
| avg(age)           | sum(age) | count() | gender |
|--------------------+----------+---------+--------|
| 28.0               | 28       | 1       | F      |
| 33.666666666666664 | 101      | 3       | M      |
+--------------------+----------+---------+--------+

Example 5: Calculate the maximum of a field

The example calculates the max age of all the accounts.

source=accounts
| stats max(age)

Expected output:

fetched rows / total rows = 1/1
+----------+
| max(age) |
|----------|
| 36       |
+----------+

Example 6: Calculate the maximum and minimum of a field by group

The example calculates the max and min age values of all the accounts group by gender.

source=accounts
| stats max(age), min(age) by gender

Expected output:

fetched rows / total rows = 2/2
+----------+----------+--------+
| max(age) | min(age) | gender |
|----------+----------+--------|
| 28       | 28       | F      |
| 36       | 32       | M      |
+----------+----------+--------+

Example 7: Calculate the distinct count of a field

To get the count of distinct values of a field, you can use DISTINCT_COUNT (or DC) function instead of COUNT. The example calculates both the count and the distinct count of gender field of all the accounts.

source=accounts
| stats count(gender), distinct_count(gender)

Expected output:

fetched rows / total rows = 1/1
+---------------+------------------------+
| count(gender) | distinct_count(gender) |
|---------------+------------------------|
| 4             | 2                      |
+---------------+------------------------+

Example 8: Calculate the count by a span

The example gets the count of age by the interval of 10 years.

source=accounts
| stats count(age) by span(age, 10) as age_span

Expected output:

fetched rows / total rows = 2/2
+------------+----------+
| count(age) | age_span |
|------------+----------|
| 1          | 20       |
| 3          | 30       |
+------------+----------+

Example 9: Calculate the count by a gender and span

The example gets the count of age by the interval of 10 years and group by gender.

source=accounts
| stats count() as cnt by span(age, 5) as age_span, gender

Expected output:

fetched rows / total rows = 3/3
+-----+----------+--------+
| cnt | age_span | gender |
|-----+----------+--------|
| 1   | 25       | F      |
| 2   | 30       | M      |
| 1   | 35       | M      |
+-----+----------+--------+

Span will always be the first grouping key whatever order you specify.

source=accounts
| stats count() as cnt by gender, span(age, 5) as age_span

Expected output:

fetched rows / total rows = 3/3
+-----+----------+--------+
| cnt | age_span | gender |
|-----+----------+--------|
| 1   | 25       | F      |
| 2   | 30       | M      |
| 1   | 35       | M      |
+-----+----------+--------+

Example 10: Calculate the count and get email list by a gender and span

The example gets the count of age by the interval of 10 years and group by gender, additionally for each row get a list of at most 5 emails.

source=accounts
| stats count() as cnt, take(email, 5) by span(age, 5) as age_span, gender

Expected output:

fetched rows / total rows = 3/3
+-----+--------------------------------------------+----------+--------+
| cnt | take(email, 5)                             | age_span | gender |
|-----+--------------------------------------------+----------+--------|
| 1   | []                                         | 25       | F      |
| 2   | [amberduke@pyrami.com,daleadams@boink.com] | 30       | M      |
| 1   | [hattiebond@netagy.com]                    | 35       | M      |
+-----+--------------------------------------------+----------+--------+

Example 11: Calculate the percentile of a field

This example shows calculating the percentile 90th age of all the accounts.

source=accounts
| stats percentile(age, 90)

Expected output:

fetched rows / total rows = 1/1
+---------------------+
| percentile(age, 90) |
|---------------------|
| 36                  |
+---------------------+

Example 12: Calculate the percentile of a field by group

This example shows calculating the percentile 90th age of all the accounts group by gender.

source=accounts
| stats percentile(age, 90) by gender

Expected output:

fetched rows / total rows = 2/2
+---------------------+--------+
| percentile(age, 90) | gender |
|---------------------+--------|
| 28                  | F      |
| 36                  | M      |
+---------------------+--------+

Example 13: Calculate the percentile by a gender and span

The example gets the percentile 90th age by the interval of 10 years and group by gender.

source=accounts
| stats percentile(age, 90) as p90 by span(age, 10) as age_span, gender

Expected output:

fetched rows / total rows = 2/2
+-----+----------+--------+
| p90 | age_span | gender |
|-----+----------+--------|
| 28  | 20       | F      |
| 36  | 30       | M      |
+-----+----------+--------+

Example 14: Collect all values in a field using LIST

The example shows how to collect all firstname values, preserving duplicates and order.

source=accounts
| stats list(firstname)

Expected output:

fetched rows / total rows = 1/1
+-----------------------------+
| list(firstname)             |
|-----------------------------|
| [Amber,Hattie,Nanette,Dale] |
+-----------------------------+

Example 15: Ignore null bucket

source=accounts
| stats bucket_nullable=false count() as cnt by email

Expected output:

fetched rows / total rows = 3/3
+-----+-----------------------+
| cnt | email                 |
|-----+-----------------------|
| 1   | amberduke@pyrami.com  |
| 1   | daleadams@boink.com   |
| 1   | hattiebond@netagy.com |
+-----+-----------------------+

Example 16: Collect unique values in a field using VALUES

The example shows how to collect all unique firstname values, sorted lexicographically with duplicates removed.

source=accounts
| stats values(firstname)

Expected output:

fetched rows / total rows = 1/1
+-----------------------------+
| values(firstname)           |
|-----------------------------|
| [Amber,Dale,Hattie,Nanette] |
+-----------------------------+

Example 17: Span on date/time field always ignore null bucket

Index example data: +-------+--------+------------+ Name | DEPTNO | birthday | +=======+========+============+ Alice | 1 | 2024-04-21 | +-------+--------+------------+ Bob | 2 | 2025-08-21 | +-------+--------+------------+ Jeff | null | 2025-04-22 | +-------+--------+------------+ Adam | 2 | null | +-------+--------+------------+

source=example
| stats count() as cnt by span(birthday, 1y) as year

Expected output:

fetched rows / total rows = 3/3
+-----+------------+
| cnt | year       |
|-----+------------|
| 1   | 2024-01-01 |
| 2   | 2025-01-01 |
+-----+------------+

source=example
| stats count() as cnt by span(birthday, 1y) as year, DEPTNO

Expected output:

fetched rows / total rows = 3/3
+-----+------------+--------+
| cnt | year       | DEPTNO |
|-----+------------+--------|
| 1   | 2024-01-01 | 1      |
| 1   | 2025-01-01 | 2      |
| 1   | 2025-01-01 | null   |
+-----+------------+--------+

source=example
| stats bucket_nullable=false count() as cnt by span(birthday, 1y) as year, DEPTNO

Expected output:

fetched rows / total rows = 3/3
+-----+------------+--------+
| cnt | year       | DEPTNO |
|-----+------------+--------|
| 1   | 2024-01-01 | 1      |
| 1   | 2025-01-01 | 2      |
+-----+------------+--------+

Example 18: Calculate the count by the implicit @timestamp field

This example demonstrates that if you omit the field parameter in the span function, it will automatically use the implicit @timestamp field.

source=big5
| stats count() by span(1month)

Expected output:

fetched rows / total rows = 1/1
+---------+---------------------+
| count() | span(1month)        |
|---------+---------------------|
| 1       | 2023-01-01 00:00:00 |
+---------+---------------------+

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stats

Description

Syntax

Aggregation Functions

Limitations

Bucket aggregation result may be approximate in large dataset

Sorting by ascending doc_count may produce inaccurate results

Example 1: Calculate the count of events

Example 2: Calculate the average of a field

Example 3: Calculate the average of a field by group

Example 4: Calculate the average, sum and count of a field by group

Example 5: Calculate the maximum of a field

Example 6: Calculate the maximum and minimum of a field by group

Example 7: Calculate the distinct count of a field

Example 8: Calculate the count by a span

Example 9: Calculate the count by a gender and span

Example 10: Calculate the count and get email list by a gender and span

Example 11: Calculate the percentile of a field

Example 12: Calculate the percentile of a field by group

Example 13: Calculate the percentile by a gender and span

Example 14: Collect all values in a field using LIST

Example 15: Ignore null bucket

Example 16: Collect unique values in a field using VALUES

Example 17: Span on date/time field always ignore null bucket

Example 18: Calculate the count by the implicit @timestamp field

FilesExpand file tree

stats.md

Latest commit

History

stats.md

File metadata and controls

stats

Description

Syntax

Aggregation Functions

Limitations

Bucket aggregation result may be approximate in large dataset

Sorting by ascending doc_count may produce inaccurate results

Example 1: Calculate the count of events

Example 2: Calculate the average of a field

Example 3: Calculate the average of a field by group

Example 4: Calculate the average, sum and count of a field by group

Example 5: Calculate the maximum of a field

Example 6: Calculate the maximum and minimum of a field by group

Example 7: Calculate the distinct count of a field

Example 8: Calculate the count by a span

Example 9: Calculate the count by a gender and span

Example 10: Calculate the count and get email list by a gender and span

Example 11: Calculate the percentile of a field

Example 12: Calculate the percentile of a field by group

Example 13: Calculate the percentile by a gender and span

Example 14: Collect all values in a field using LIST

Example 15: Ignore null bucket

Example 16: Collect unique values in a field using VALUES

Example 17: Span on date/time field always ignore null bucket

Example 18: Calculate the count by the implicit @timestamp field