Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Apr 9, 2025

This change adds the avg_over_time aggregation for time series indices. Similar to other time series aggregations, we need to translate avg_over_time into regular aggregations. There are two options for this translation:

  1. Translate avg_over_time to EVAL div(sum_over_time, count_over_time), then translate sum_over_time and count_over_time to sum and count.
  2. Translate avg_over_time directly to avg, and then to div(sum, count).

This PR chooses the latter approach. Below is an example:

TS k8s 
| STATS sum(avg_over_time(memory_usage)) BY host, bucket(@timestamp, 1minute)

translates to:

TS k8s
| STATS avg_memory_usage = avg(memory_usage), host_values=VALUES(host) BY _tsid, time_bucket=bucket(@timestamp, 1minute)
| STATS sum(avg_memory_usage) BY host_values, time_bucket

and then:

TS k8s
| STATS sum_memory_usage = sum(memory_usage), count_memory_usage = count(memory_usage), host_values=VALUES(host) BY _tsid, time_bucket=bucket(@timestamp, 1minute)
| EVAL avg_memory_usage = sum_memory_usage / count_memory_usage
| STATS sum(avg_memory_usage) BY host_values, time_bucket

Since we need to substitute AVG with SUM and COUNT after translation, we need to call SubstituteSurrogates twice in LogicalPlanOptimizer. If there is a performance impact, we can move this rule to TranslateTimeSeriesAggregate.

@dnhatn dnhatn force-pushed the avg_over_time branch 2 times, most recently from ca9a217 to 38c95b3 Compare April 10, 2025 01:41
@dnhatn dnhatn marked this pull request as ready for review April 10, 2025 04:00
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine labels Apr 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

MAX_OVER_TIME(Build.current().isSnapshot()),

/**
* Support avg_over_time aggregation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add ", that gets evaluated per time-series".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ see 24ee2d3

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So simple, well done.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also played around with it locally. LGTM

@dnhatn dnhatn merged commit 1739049 into elastic:main Apr 11, 2025
17 checks passed
@dnhatn dnhatn deleted the avg_over_time branch April 11, 2025 03:38
@dnhatn
Copy link
Member Author

dnhatn commented Apr 11, 2025

@kkrik-es @martijnvg Thanks for the reviews.

@dnhatn dnhatn mentioned this pull request Apr 28, 2025
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants