Aggregation functions perform calculations across multiple rows to return a single result value. These functions are used with stats, eventstats and streamstats commands to analyze and summarize data.
The following table shows how NULL/MISSING values are handled by aggregation functions:
| Function | NULL | MISSING |
|---|---|---|
| COUNT | Not counted | Not counted |
| SUM | Ignore | Ignore |
| AVG | Ignore | Ignore |
| MAX | Ignore | Ignore |
| MIN | Ignore | Ignore |
| FIRST | Ignore | Ignore |
| LAST | Ignore | Ignore |
| LIST | Ignore | Ignore |
| VALUES | Ignore | Ignore |
Usage: Returns a count of the number of expr in the rows retrieved. The C() function, c, and count can be used as abbreviations for COUNT(). To perform a filtered counting, wrap the condition to satisfy in an eval expression.
Example
source=accounts
| stats count(), c(), count, c
Expected output:
fetched rows / total rows = 1/1
+---------+-----+-------+---+
| count() | c() | count | c |
|---------+-----+-------+---|
| 4 | 4 | 4 | 4 |
+---------+-----+-------+---+
Example of filtered counting
source=accounts
| stats count(eval(age > 30)) as mature_users
Expected output:
fetched rows / total rows = 1/1
+--------------+
| mature_users |
|--------------|
| 3 |
+--------------+
Usage: SUM(expr). Returns the sum of expr. Example
source=accounts
| stats sum(age) by gender
Expected output:
fetched rows / total rows = 2/2
+----------+--------+
| sum(age) | gender |
|----------+--------|
| 28 | F |
| 101 | M |
+----------+--------+
Usage: AVG(expr). Returns the average value of expr. Example
source=accounts
| stats avg(age) by gender
Expected output:
fetched rows / total rows = 2/2
+--------------------+--------+
| avg(age) | gender |
|--------------------+--------|
| 28.0 | F |
| 33.666666666666664 | M |
+--------------------+--------+
Usage: MAX(expr). Returns the maximum value of expr. For non-numeric fields, values are sorted lexicographically. Example
source=accounts
| stats max(age)
Expected output:
fetched rows / total rows = 1/1
+----------+
| max(age) |
|----------|
| 36 |
+----------+
Example with text field
source=accounts
| stats max(firstname)
Expected output:
fetched rows / total rows = 1/1
+----------------+
| max(firstname) |
|----------------|
| Nanette |
+----------------+
Usage: MIN(expr). Returns the minimum value of expr. For non-numeric fields, values are sorted lexicographically. Example
source=accounts
| stats min(age)
Expected output:
fetched rows / total rows = 1/1
+----------+
| min(age) |
|----------|
| 28 |
+----------+
Example with text field
source=accounts
| stats min(firstname)
Expected output:
fetched rows / total rows = 1/1
+----------------+
| min(firstname) |
|----------------|
| Amber |
+----------------+
Usage: VAR_SAMP(expr). Returns the sample variance of expr. Example
source=accounts
| stats var_samp(age)
Expected output:
fetched rows / total rows = 1/1
+--------------------+
| var_samp(age) |
|--------------------|
| 10.916666666666666 |
+--------------------+
Usage: VAR_POP(expr). Returns the population standard variance of expr. Example
source=accounts
| stats var_pop(age)
Expected output:
fetched rows / total rows = 1/1
+--------------+
| var_pop(age) |
|--------------|
| 8.1875 |
+--------------+
Usage: STDDEV_SAMP(expr). Return the sample standard deviation of expr. Example
source=accounts
| stats stddev_samp(age)
Expected output:
fetched rows / total rows = 1/1
+-------------------+
| stddev_samp(age) |
|-------------------|
| 3.304037933599835 |
+-------------------+
Usage: STDDEV_POP(expr). Return the population standard deviation of expr. Example
source=accounts
| stats stddev_pop(age)
Expected output:
fetched rows / total rows = 1/1
+--------------------+
| stddev_pop(age) |
|--------------------|
| 2.8613807855648994 |
+--------------------+
Usage: DISTINCT_COUNT(expr), DC(expr). Returns the approximate number of distinct values using the HyperLogLog++ algorithm. Both functions are equivalent. For details on algorithm accuracy and precision control, see the OpenSearch Cardinality Aggregation documentation. Example
source=accounts
| stats dc(state) as distinct_states, distinct_count(state) as dc_states_alt by gender
Expected output:
fetched rows / total rows = 2/2
+-----------------+---------------+--------+
| distinct_states | dc_states_alt | gender |
|-----------------+---------------+--------|
| 1 | 1 | F |
| 3 | 3 | M |
+-----------------+---------------+--------+
Usage: DISTINCT_COUNT_APPROX(expr). Return the approximate distinct count value of the expr, using the hyperloglog++ algorithm. Example
source=accounts
| stats distinct_count_approx(gender)
Expected output:
fetched rows / total rows = 1/1
+-------------------------------+
| distinct_count_approx(gender) |
|-------------------------------|
| 2 |
+-------------------------------+
Usage: EARLIEST(field [, time_field]). Return the earliest value of a field based on timestamp ordering.
- field: mandatory. The field to return the earliest value for.
- time_field: optional. The field to use for time-based ordering. Defaults to @timestamp if not specified.
Example
source=events
| stats earliest(message) by host
| sort host
Expected output:
fetched rows / total rows = 2/2
+-------------------+---------+
| earliest(message) | host |
|-------------------+---------|
| Starting up | server1 |
| Initializing | server2 |
+-------------------+---------+
Example with custom time field
source=events
| stats earliest(status, event_time) by category
| sort category
Expected output:
fetched rows / total rows = 2/2
+------------------------------+----------+
| earliest(status, event_time) | category |
|------------------------------+----------|
| pending | orders |
| active | users |
+------------------------------+----------+
Usage: LATEST(field [, time_field]). Return the latest value of a field based on timestamp ordering.
- field: mandatory. The field to return the latest value for.
- time_field: optional. The field to use for time-based ordering. Defaults to @timestamp if not specified.
Example
source=events
| stats latest(message) by host
| sort host
Expected output:
fetched rows / total rows = 2/2
+------------------+---------+
| latest(message) | host |
|------------------+---------|
| Shutting down | server1 |
| Maintenance mode | server2 |
+------------------+---------+
Example with custom time field
source=events
| stats latest(status, event_time) by category
| sort category
Expected output:
fetched rows / total rows = 2/2
+----------------------------+----------+
| latest(status, event_time) | category |
|----------------------------+----------|
| cancelled | orders |
| inactive | users |
+----------------------------+----------+
Usage: TAKE(field [, size]). Return original values of a field. It does not guarantee on the order of values.
- field: mandatory. The field must be a text field.
- size: optional integer. The number of values should be returned. Default is 10.
Example
source=accounts
| stats take(firstname)
Expected output:
fetched rows / total rows = 1/1
+-----------------------------+
| take(firstname) |
|-----------------------------|
| [Amber,Hattie,Nanette,Dale] |
+-----------------------------+
Usage: PERCENTILE(expr, percent) or PERCENTILE_APPROX(expr, percent). Return the approximate percentile value of expr at the specified percentage.
- percent: The number must be a constant between 0 and 100.
Note: From 3.1.0, the percentile implementation is switched to MergingDigest from AVLTreeDigest. Ref issue link. Example
source=accounts
| stats percentile(age, 90) by gender
Expected output:
fetched rows / total rows = 2/2
+---------------------+--------+
| percentile(age, 90) | gender |
|---------------------+--------|
| 28 | F |
| 36 | M |
+---------------------+--------+
For convenience, OpenSearch PPL provides shortcut functions for common percentiles:
PERC<percent>(expr)- Equivalent toPERCENTILE(expr, <percent>)P<percent>(expr)- Equivalent toPERCENTILE(expr, <percent>)
Both integer and decimal percentiles from 0 to 100 are supported (e.g., PERC95, P99.5).
source=accounts
| stats perc99.5(age);
Expected output:
fetched rows / total rows = 1/1
+---------------+
| perc99.5(age) |
|---------------|
| 36 |
+---------------+
source=accounts
| stats p50(age);
Expected output:
fetched rows / total rows = 1/1
+----------+
| p50(age) |
|----------|
| 33 |
+----------+
Usage: MEDIAN(expr). Returns the median (50th percentile) value of expr. This is equivalent to PERCENTILE(expr, 50).
Example
source=accounts
| stats median(age)
Expected output:
fetched rows / total rows = 1/1
+-------------+
| median(age) |
|-------------|
| 33 |
+-------------+
Usage: FIRST(field). Return the first non-null value of a field based on natural document order. Returns NULL if no records exist, or if all records have NULL values for the field.
- field: mandatory. The field to return the first value for.
Example
source=accounts
| stats first(firstname) by gender
Expected output:
fetched rows / total rows = 2/2
+------------------+--------+
| first(firstname) | gender |
|------------------+--------|
| Nanette | F |
| Amber | M |
+------------------+--------+
Usage: LAST(field). Return the last non-null value of a field based on natural document order. Returns NULL if no records exist, or if all records have NULL values for the field.
- field: mandatory. The field to return the last value for.
Example
source=accounts
| stats last(firstname) by gender
Expected output:
fetched rows / total rows = 2/2
+-----------------+--------+
| last(firstname) | gender |
|-----------------+--------|
| Nanette | F |
| Dale | M |
+-----------------+--------+
Usage: LIST(expr). Collects all values from the specified expression into an array. Values are converted to strings, nulls are filtered, and duplicates are preserved. The function returns up to 100 values with no guaranteed ordering.
- expr: The field expression to collect values from.
- This aggregation function doesn't support Array, Struct, Object field types.
Example with string fields
source=accounts
| stats list(firstname)
Expected output:
fetched rows / total rows = 1/1
+-----------------------------+
| list(firstname) |
|-----------------------------|
| [Amber,Hattie,Nanette,Dale] |
+-----------------------------+
Usage: VALUES(expr). Collects all unique values from the specified expression into a sorted array. Values are converted to strings, nulls are filtered, and duplicates are removed.
The maximum number of unique values returned is controlled by the plugins.ppl.values.max.limit setting:
- Default value is 0, which means unlimited values are returned
- Can be configured to any positive integer to limit the number of unique values
- See the PPL Settings documentation for more details
Example with string fields
source=accounts
| stats values(firstname)
Expected output:
fetched rows / total rows = 1/1
+-----------------------------+
| values(firstname) |
|-----------------------------|
| [Amber,Dale,Hattie,Nanette] |
+-----------------------------+