Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Sep 6, 2025

Currently, time-series aggregations require two explicit aggregations: an over-time aggregation and an outer aggregation. This means a query like TS metrics | STATS AVG(memory_usage) calculates the average of all memory_usage values, functioning in a document-centric way like the FROM command. This is not the desired metrics-centric behavior for the TS command.

This change proposes to implicitly use last_over_time for time-series aggregations that do not specify an over-time aggregation.

For example:

TS metrics | STATS AVG(memory_usage)

becomes:

FROM metrics
| WHERE memory_usage IS NOT NULL
| STATS mem = last_over_time(memory) BY _tsid
| STATS AVG(mem) as `AVG(memory_usage)`

TS metrics | WHERE TRANGE(5m) | STATS AVG(memory_usage)

becomes:

FROM metrics
| WHERE memory_usage IS NOT NULL AND @timestamp < now() AND @timestamp >= now() - 5m
| STATS mem = last_over_time(memory) BY _tsid
| STATS AVG(mem) as `AVG(memory_usage)`

And:

TS metrics | WHERE TRANGE(5m) | STATS AVG(memory_usage), SUM(rate(requests)) BY cluster, TBUCKET(1m)

becomes:

FROM metrics
| WHERE (memory_usage IS NOT NULL OR requests IS NOT NULL) 
                AND @timestamp < now() AND @timestamp >= now() - 5m
| STATS m = last_over_time(memory), r=rate(requests), cluster=VALUES(cluster) BY _tsid, TBUCKET(1m)
| STATS AVG(m) as `AVG(memory_usage)`, SUM(r) as `SUM(rate(request))` BY cluster

This change does not require a default lookback window. Users can specify the lookback window in the WHERE clause with TRANGE, as shown in the last example.

@dnhatn dnhatn force-pushed the last-over-time branch 3 times, most recently from 8f73fa2 to 6cf27c8 Compare September 7, 2025 16:32
@dnhatn
Copy link
Member Author

dnhatn commented Sep 7, 2025

@kkrik-es @martijnvg One question here is whether we should support this behavior with a warning that recommends users move to explicit OVER_TIME aggregations. While this makes queries more concise, we should consider if users will be okay with this default implicit behavior.

@dnhatn dnhatn requested review from kkrik-es and martijnvg September 7, 2025 16:33
if (attr.name().equals(MetadataAttribute.TIMESTAMP_FIELD)) {
timestamp.set(attr);
// TODO: reject over_time_aggregation only
var tsAgg = new LastOverTime(af.source(), af.field(), timestamp.get());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main change.

@kkrik-es
Copy link
Contributor

kkrik-es commented Sep 8, 2025

@kkrik-es @martijnvg One question here is whether we should support this behavior with a warning that recommends users move to explicit OVER_TIME aggregations. While this makes queries more concise, we should consider if users will be okay with this default implicit behavior.

I think we should document expectations and best practices but have the query return without warnings. This way, we can have may existing queries to work by switching to TS, with the assumptions that also apply to PromQL.

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@kkrik-es
Copy link
Contributor

kkrik-es commented Sep 8, 2025

Thanks Nhat, let's proceed with this. I'm still thinking about this example:

TS metrics | STATS AVG(memory_usage)

This will scan all data so it'll likely timeout or run out of resources. I wonder if we should be including the default lookback window, even if it's just for this case. Then again, there are many ways that this can happen in ES|QL, so we can delay that for PromQL proper.

@dnhatn dnhatn added :StorageEngine/TSDB You know, for Metrics >non-issue labels Sep 9, 2025
@dnhatn dnhatn marked this pull request as ready for review September 9, 2025 01:12
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@dnhatn
Copy link
Member Author

dnhatn commented Sep 9, 2025

This will scan all data so it'll likely timeout or run out of resources. I wonder if we should be including the default lookback window, even if it's just for this case. Then again, there are many ways that this can happen in ES|QL, so we can delay that for PromQL proper

Sure @kkrik-es. I will do this in a follow-up. Thanks for the review.

@dnhatn dnhatn enabled auto-merge (squash) September 9, 2025 01:15
@dnhatn dnhatn merged commit f1899f1 into elastic:main Sep 9, 2025
34 checks passed
@dnhatn dnhatn deleted the last-over-time branch September 9, 2025 02:10
rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Sep 9, 2025
…34260)

Currently, time-series aggregations require two explicit aggregations: 
an over-time aggregation and an outer aggregation. This means a query
like `TS metrics | STATS AVG(memory_usage)` calculates the average of
all `memory_usage` values, functioning in a document-centric way like
the `FROM` command. This is not the desired metrics-centric behavior for
the `TS` command.

This change proposes to implicitly use `last_over_time` for time-series 
aggregations that do not specify an over-time aggregation.

For example:

```
TS metrics | STATS AVG(memory_usage)
```

becomes:

```
FROM metrics
| WHERE memory_usage IS NOT NULL
| STATS mem = last_over_time(memory) BY _tsid
| STATS AVG(mem) as `AVG(memory_usage)`
```

---
```
TS metrics | WHERE TRANGE(5m) | STATS AVG(memory_usage)
```

becomes:

```
FROM metrics
| WHERE memory_usage IS NOT NULL AND @timestamp < now() AND @timestamp >= now() - 5m
| STATS mem = last_over_time(memory) BY _tsid
| STATS AVG(mem) as `AVG(memory_usage)`
```

---
And:
```
TS metrics | WHERE TRANGE(5m) | STATS AVG(memory_usage), SUM(rate(requests)) BY cluster, TBUCKET(1m)
```
becomes:

```
FROM metrics
| WHERE (memory_usage IS NOT NULL OR requests IS NOT NULL)
        AND @timestamp < now() AND @timestamp >= now() - 5m
| STATS m = last_over_time(memory), r=rate(requests), cluster=VALUES(cluster) BY _tsid, TBUCKET(1m)
| STATS AVG(m) as `AVG(memory_usage)`, SUM(r) as `SUM(rate(request))` BY cluster
```

---
This change does not require a default lookback window. Users can
specify the lookback window in the `WHERE` clause with `TRANGE`, as
shown in the last example.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants