The stats command calculates the aggregation from the search result.
stats [bucket_nullable=bool] <aggregation>... [by-clause]
- aggregation: mandatory. An aggregation function.
- bucket_nullable: optional. Controls whether the stats command includes null buckets in group-by aggregations. When set to
false, the aggregation ignores records where the group-by field is null, resulting in faster performance by excluding null bucket. Default: Determined byplugins.ppl.syntax.legacy.preferred.- When
plugins.ppl.syntax.legacy.preferred=true,bucket_nullabledefaults totrue - When
plugins.ppl.syntax.legacy.preferred=false,bucket_nullabledefaults tofalse
- When
- by-clause: optional. Groups results by specified fields or expressions. Syntax: by [span-expression,] [field,]... Default: If no by-clause is specified, the stats command returns only one row, which is the aggregation over the entire result set.
- span-expression: optional, at most one. Splits field into buckets by intervals. Syntax: span(field_expr, interval_expr). The unit of the interval expression is the natural unit by default. If the field is a date/time type field, the aggregation results always ignore null bucket. For example,
span(age, 10)creates 10-year age buckets,span(timestamp, 1h)creates hourly buckets.- Available time units
- millisecond (ms)
- second (s)
- minute (m, case sensitive)
- hour (h)
- day (d)
- week (w)
- month (M, case sensitive)
- quarter (q)
- year (y)
- Available time units
The stats command supports the following aggregation functions:
- COUNT/C: Count of values
- SUM: Sum of numeric values
- AVG: Average of numeric values
- MAX: Maximum value
- MIN: Minimum value
- VAR_SAMP: Sample variance
- VAR_POP: Population variance
- STDDEV_SAMP: Sample standard deviation
- STDDEV_POP: Population standard deviation
- DISTINCT_COUNT_APPROX: Approximate distinct count
- TAKE: List of original values
- PERCENTILE/PERCENTILE_APPROX: Percentile calculations
- PERC<percent>/P<percent>: Percentile shortcut functions
- MEDIAN: 50th percentile
- EARLIEST: Earliest value by timestamp
- LATEST: Latest value by timestamp
- FIRST: First non-null value
- LAST: Last non-null value
- LIST: Collect all values into array
- VALUES: Collect unique values into sorted array
For detailed documentation of each function, see Aggregation Functions.
In OpenSearch, doc_count values for a terms bucket aggregation may be approximate. As a result, any aggregations (such as sum and avg) on the terms bucket aggregation may also be approximate.
For example, the following PPL query (find the top 10 URLs) may return an approximate result if the cardinality of URL is high.
source=hits
| stats bucket_nullable=false count() as c by URL
| sort - c
| head 10
This query is pushed down to a terms bucket aggregation DSL query with "order": { "_count": "desc" }. In OpenSearch, this terms aggregation may throw away some buckets.
Similar to above PPL query, the following query (find the rare 10 URLs) often produces inaccurate results.
source=hits
| stats bucket_nullable=false count() as c by URL
| sort + c
| head 10
A term that is globally infrequent might not appear as infrequent on every individual shard or might be entirely absent from the least frequent results returned by some shards. Conversely, a term that appears infrequently on one shard might be common on another. In both scenarios, rare terms can be missed during shard-level aggregation, resulting in incorrect overall results.
This example shows calculating the count of events in the accounts.
source=accounts
| stats count()
Expected output:
fetched rows / total rows = 1/1
+---------+
| count() |
|---------|
| 4 |
+---------+
This example shows calculating the average age of all the accounts.
source=accounts
| stats avg(age)
Expected output:
fetched rows / total rows = 1/1
+----------+
| avg(age) |
|----------|
| 32.25 |
+----------+
This example shows calculating the average age of all the accounts group by gender.
source=accounts
| stats avg(age) by gender
Expected output:
fetched rows / total rows = 2/2
+--------------------+--------+
| avg(age) | gender |
|--------------------+--------|
| 28.0 | F |
| 33.666666666666664 | M |
+--------------------+--------+
This example shows calculating the average age, sum age and count of events of all the accounts group by gender.
source=accounts
| stats avg(age), sum(age), count() by gender
Expected output:
fetched rows / total rows = 2/2
+--------------------+----------+---------+--------+
| avg(age) | sum(age) | count() | gender |
|--------------------+----------+---------+--------|
| 28.0 | 28 | 1 | F |
| 33.666666666666664 | 101 | 3 | M |
+--------------------+----------+---------+--------+
The example calculates the max age of all the accounts.
source=accounts
| stats max(age)
Expected output:
fetched rows / total rows = 1/1
+----------+
| max(age) |
|----------|
| 36 |
+----------+
The example calculates the max and min age values of all the accounts group by gender.
source=accounts
| stats max(age), min(age) by gender
Expected output:
fetched rows / total rows = 2/2
+----------+----------+--------+
| max(age) | min(age) | gender |
|----------+----------+--------|
| 28 | 28 | F |
| 36 | 32 | M |
+----------+----------+--------+
To get the count of distinct values of a field, you can use DISTINCT_COUNT (or DC) function instead of COUNT. The example calculates both the count and the distinct count of gender field of all the accounts.
source=accounts
| stats count(gender), distinct_count(gender)
Expected output:
fetched rows / total rows = 1/1
+---------------+------------------------+
| count(gender) | distinct_count(gender) |
|---------------+------------------------|
| 4 | 2 |
+---------------+------------------------+
The example gets the count of age by the interval of 10 years.
source=accounts
| stats count(age) by span(age, 10) as age_span
Expected output:
fetched rows / total rows = 2/2
+------------+----------+
| count(age) | age_span |
|------------+----------|
| 1 | 20 |
| 3 | 30 |
+------------+----------+
The example gets the count of age by the interval of 10 years and group by gender.
source=accounts
| stats count() as cnt by span(age, 5) as age_span, gender
Expected output:
fetched rows / total rows = 3/3
+-----+----------+--------+
| cnt | age_span | gender |
|-----+----------+--------|
| 1 | 25 | F |
| 2 | 30 | M |
| 1 | 35 | M |
+-----+----------+--------+
Span will always be the first grouping key whatever order you specify.
source=accounts
| stats count() as cnt by gender, span(age, 5) as age_span
Expected output:
fetched rows / total rows = 3/3
+-----+----------+--------+
| cnt | age_span | gender |
|-----+----------+--------|
| 1 | 25 | F |
| 2 | 30 | M |
| 1 | 35 | M |
+-----+----------+--------+
The example gets the count of age by the interval of 10 years and group by gender, additionally for each row get a list of at most 5 emails.
source=accounts
| stats count() as cnt, take(email, 5) by span(age, 5) as age_span, gender
Expected output:
fetched rows / total rows = 3/3
+-----+--------------------------------------------+----------+--------+
| cnt | take(email, 5) | age_span | gender |
|-----+--------------------------------------------+----------+--------|
| 1 | [] | 25 | F |
| 2 | [amberduke@pyrami.com,daleadams@boink.com] | 30 | M |
| 1 | [hattiebond@netagy.com] | 35 | M |
+-----+--------------------------------------------+----------+--------+
This example shows calculating the percentile 90th age of all the accounts.
source=accounts
| stats percentile(age, 90)
Expected output:
fetched rows / total rows = 1/1
+---------------------+
| percentile(age, 90) |
|---------------------|
| 36 |
+---------------------+
This example shows calculating the percentile 90th age of all the accounts group by gender.
source=accounts
| stats percentile(age, 90) by gender
Expected output:
fetched rows / total rows = 2/2
+---------------------+--------+
| percentile(age, 90) | gender |
|---------------------+--------|
| 28 | F |
| 36 | M |
+---------------------+--------+
The example gets the percentile 90th age by the interval of 10 years and group by gender.
source=accounts
| stats percentile(age, 90) as p90 by span(age, 10) as age_span, gender
Expected output:
fetched rows / total rows = 2/2
+-----+----------+--------+
| p90 | age_span | gender |
|-----+----------+--------|
| 28 | 20 | F |
| 36 | 30 | M |
+-----+----------+--------+
The example shows how to collect all firstname values, preserving duplicates and order.
source=accounts
| stats list(firstname)
Expected output:
fetched rows / total rows = 1/1
+-----------------------------+
| list(firstname) |
|-----------------------------|
| [Amber,Hattie,Nanette,Dale] |
+-----------------------------+
source=accounts
| stats bucket_nullable=false count() as cnt by email
Expected output:
fetched rows / total rows = 3/3
+-----+-----------------------+
| cnt | email |
|-----+-----------------------|
| 1 | amberduke@pyrami.com |
| 1 | daleadams@boink.com |
| 1 | hattiebond@netagy.com |
+-----+-----------------------+
The example shows how to collect all unique firstname values, sorted lexicographically with duplicates removed.
source=accounts
| stats values(firstname)
Expected output:
fetched rows / total rows = 1/1
+-----------------------------+
| values(firstname) |
|-----------------------------|
| [Amber,Dale,Hattie,Nanette] |
+-----------------------------+
Index example data: +-------+--------+------------+ Name | DEPTNO | birthday | +=======+========+============+ Alice | 1 | 2024-04-21 | +-------+--------+------------+ Bob | 2 | 2025-08-21 | +-------+--------+------------+ Jeff | null | 2025-04-22 | +-------+--------+------------+ Adam | 2 | null | +-------+--------+------------+
source=example
| stats count() as cnt by span(birthday, 1y) as year
Expected output:
fetched rows / total rows = 3/3
+-----+------------+
| cnt | year |
|-----+------------|
| 1 | 2024-01-01 |
| 2 | 2025-01-01 |
+-----+------------+
source=example
| stats count() as cnt by span(birthday, 1y) as year, DEPTNO
Expected output:
fetched rows / total rows = 3/3
+-----+------------+--------+
| cnt | year | DEPTNO |
|-----+------------+--------|
| 1 | 2024-01-01 | 1 |
| 1 | 2025-01-01 | 2 |
| 1 | 2025-01-01 | null |
+-----+------------+--------+
source=example
| stats bucket_nullable=false count() as cnt by span(birthday, 1y) as year, DEPTNO
Expected output:
fetched rows / total rows = 3/3
+-----+------------+--------+
| cnt | year | DEPTNO |
|-----+------------+--------|
| 1 | 2024-01-01 | 1 |
| 1 | 2025-01-01 | 2 |
+-----+------------+--------+
This example demonstrates that if you omit the field parameter in the span function, it will automatically use the implicit @timestamp field.
source=big5
| stats count() by span(1month)
Expected output:
fetched rows / total rows = 1/1
+---------+---------------------+
| count() | span(1month) |
|---------+---------------------|
| 1 | 2023-01-01 00:00:00 |
+---------+---------------------+