Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Apr 9, 2025

This change adds support for the max_over_time aggregation for time_series indices. Similar to the rate aggregation, this aggregation is translated into two stages: the first stage groups by _tsid (and time-bucket), and the second stage groups by the user-specified groups.

For example:

TS my-metrics 
| STATS SUM(max_over_time(memory_usage)) BY cluster, bucket(@timestamp, 1 minute)

is translated into:

TS my-metrics 
| STATS max_memory_usage=max(memory_usage), cluster=VALUES(cluster) BY _tsid, ts=bucket(@timestamp, 1 minute) 
| STATS sum(max_memory_usage) BY cluster, ts

In this case, we don't need to keep the Lucene source emitted in the order of _tsid/timestamp, but I leave this optimization for the future.

Other *_over_time functions will be added later.

@dnhatn dnhatn force-pushed the max_over_time branch 2 times, most recently from 3ae9557 to 8951ed4 Compare April 9, 2025 05:57
@dnhatn dnhatn requested review from kkrik-es and martijnvg April 9, 2025 06:06
@dnhatn dnhatn marked this pull request as ready for review April 9, 2025 06:06
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine labels Apr 9, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@dnhatn dnhatn requested a review from not-napoleon April 9, 2025 06:22
QUERY_MONITORING;
QUERY_MONITORING,

MAX_OVER_TIME(Build.current().isSnapshot());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add explanatory comment as the rest of the entries.

* | STATS sum(`rate(request)`), `min(memory_used)` = from_partial($p1, min($)) BY pod=`VALUES(pod)`, `bucket(@timestamp, 5m)`
* | KEEP `min(memory_used)`, `sum(rate(request))`, pod, `bucket(@timestamp, 5m)`
*
* _over_time time-series aggregation will be rewritten in the similar way
Copy link
Contributor

@kkrik-es kkrik-es Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: {agg}_over_time

* becomes
*
* TS k8s
* | STATS max(memory_usage), VALUES(host) BY _tsid, bucket(@timestamp, 1minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe: | STATS max_memory_usage=max(memory_usage), VALUES(host) BY _tsid, bucket(@timestamp, 1minute)

to reference it below?

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, it's really nice to see this getting generalized so easily. We can add the rest with minimal effort, too.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nhat, this approach looks very clean 👍

@dnhatn
Copy link
Member Author

dnhatn commented Apr 9, 2025

@kkrik-es @martijnvg Thanks :)

@dnhatn dnhatn merged commit c888b2c into elastic:main Apr 9, 2025
17 checks passed
@dnhatn dnhatn deleted the max_over_time branch April 9, 2025 23:01
@dnhatn dnhatn mentioned this pull request Apr 28, 2025
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants