Speed up time-series aggregation

Time-series aggregations, such as `{agg}_over_time` and `rate`, against time-series indices are currently slow due to several reasons:

1. They require two phases: 
   - First, grouping by each time-series (by `tsid` and `timebucket`).
   - Then, grouping by user-specified groups.
2. For `rate` aggregations, data must be provided in timestamp order per time-series.

This issue proposes some ideas and tracks optimizations to improve the performance of time-series aggregations in ES|QL.

#### Source command
- [x] Translate time-series queries without `rate` to `FROM`: https://github.com/elastic/elasticsearch/pull/127033
- [x] Avoid comparing `tsid` when iterating over documents in TS source:  https://github.com/elastic/elasticsearch/pull/127095
- [x] Extract fields directly from the time-series source: https://github.com/elastic/elasticsearch/pull/127445
- [x] Speed up reading dimension fields: #128283
- [x]  One segment:  #131502
- [ ] Field extraction for single segment (for rate)
- [ ] ~Optimize loading of time-series data using `FROM`~

#### Execution
- [x] Execute time-series source in a separate driver: https://github.com/elastic/elasticsearch/pull/128419
- [x] Execute extract fields in a separate driver: https://github.com/elastic/elasticsearch/pull/128643  
- [x] Constant blocks: https://github.com/elastic/elasticsearch/issues/132379
- [x] Increase parallelism for rate execution
- [ ] Support segment data partitioning for TS for non-overlapping segments (for rate)
- [ ] Run one shard at a time to leverage TimeSeriesBlockHash (for rate)
- [ ] ~Emit final results for non-overlapping buckets (drop tsid for these buckets)~

#### Values aggregation (for dimension fields)
- [x] Emit ordinal output blocks: https://github.com/elastic/elasticsearch/pull/127201
- [x] Handle ordinal input blocks: https://github.com/elastic/elasticsearch/pull/127849
- [x] Optimize for single-value aggregations (dimension fields?)

#### Block hash
- [x] Enable time-series block hash: https://github.com/elastic/elasticsearch/pull/127488
- [x] Leverage ordinal blocks in time-series block hash: https://github.com/elastic/elasticsearch/pull/127488
- [ ] ~Emit ordinal blocks in PackedValuesBlockHash~

#### Planning
- [ ] Skip backing indices with `start_time` and `end_time` outside the `TRANGE` filter.
- [ ] ~Use a single aggregation for the second phase~
- [ ] ~Optimize for a single target index~

#### Misc
- [x] https://github.com/elastic/elasticsearch/pull/127949
- [x]  Lossy summation: https://github.com/elastic/elasticsearch/pull/132625
- [ ]  Load the first seen value only for last_over_time

#### Migrated from 105397 and to be considered 
- [ ] Add support of sparse index to easily navigate a time series documents (https://github.com/elastic/elasticsearch/issues/95701). This is required for determining the last value of a metric and skipping to the next last value of the next time serie. And other functionally like interpolation and geo fencing. Additionally a query may be too selective, and mask documents which are valid metric of a time serie. A sparse index would allow us to access the metrics even if that would be the case.
- [ ] Enhancing the time serie grouping operator to also group by time series and time interval. A typical use case would group by time serie and time interval. This is when the BUCKET syntax is used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up time-series aggregation #127444

Source command

Execution

Values aggregation (for dimension fields)

Block hash

Planning

Misc

Migrated from 105397 and to be considered

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Speed up time-series aggregation #127444

Description

Source command

Execution

Values aggregation (for dimension fields)

Block hash

Planning

Misc

Migrated from 105397 and to be considered

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions