Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented May 7, 2025

Currently, time-series aggregations use the values aggregation to collect dimension values. While we might introduce a specialized aggregation for this in the future, for now, we are using values, and the inputs are likely ordinal blocks. This change speeds up the values aggregation when the inputs are ordinal-based.

Execution time reduced from 461ms to 192ms for 1000 groups.

ValuesAggregatorBenchmark.run    BytesRef     10000  avgt    7  461.938 ± 6.089  ms/op
ValuesAggregatorBenchmark.run    BytesRef     10000  avgt    7  192.898 ± 1.781  ms/op

@dnhatn dnhatn force-pushed the support-ordinals-values-aggs branch from 5a4fbba to 1fad50d Compare May 7, 2025 17:49
@dnhatn dnhatn requested review from ivancea and nik9000 May 7, 2025 18:21
@dnhatn dnhatn marked this pull request as ready for review May 7, 2025 18:21
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine labels May 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

}

$if(BytesRef)$
public static GroupingAggregatorFunction.AddInput wrapAddInput(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make it a static method somewhere else and reference it? So we don't need to edit it without the IDE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea :)

@dnhatn dnhatn force-pushed the support-ordinals-values-aggs branch from a8253f1 to 001fe81 Compare May 7, 2025 18:54
@dnhatn
Copy link
Member Author

dnhatn commented May 8, 2025

Thanks Nik!

@dnhatn dnhatn merged commit 7b87266 into elastic:main May 8, 2025
17 checks passed
@dnhatn dnhatn deleted the support-ordinals-values-aggs branch May 8, 2025 01:24
@dnhatn dnhatn mentioned this pull request May 8, 2025
28 tasks
ywangd pushed a commit to ywangd/elasticsearch that referenced this pull request May 9, 2025
Currently, time-series aggregations use the `values` aggregation to 
collect dimension values. While we might introduce a specialized
aggregation for this in the future, for now, we are using `values`, and
the inputs are likely ordinal blocks. This change speeds up the `values`
aggregation when the inputs are ordinal-based.

Execution time reduced from 461ms to 192ms for 1000 groups.

```
ValuesAggregatorBenchmark.run    BytesRef     10000  avgt    7  461.938 ± 6.089  ms/op
ValuesAggregatorBenchmark.run    BytesRef     10000  avgt    7  192.898 ± 1.781  ms/op
```
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request May 12, 2025
Currently, time-series aggregations use the `values` aggregation to 
collect dimension values. While we might introduce a specialized
aggregation for this in the future, for now, we are using `values`, and
the inputs are likely ordinal blocks. This change speeds up the `values`
aggregation when the inputs are ordinal-based.

Execution time reduced from 461ms to 192ms for 1000 groups.

```
ValuesAggregatorBenchmark.run    BytesRef     10000  avgt    7  461.938 ± 6.089  ms/op
ValuesAggregatorBenchmark.run    BytesRef     10000  avgt    7  192.898 ± 1.781  ms/op
```
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jun 5, 2025
Currently, time-series aggregations use the `values` aggregation to
collect dimension values. While we might introduce a specialized
aggregation for this in the future, for now, we are using `values`, and
the inputs are likely ordinal blocks. This change speeds up the `values`
aggregation when the inputs are ordinal-based.

Execution time reduced from 461ms to 192ms for 1000 groups.

```
ValuesAggregatorBenchmark.run    BytesRef     10000  avgt    7  461.938 ± 6.089  ms/op
ValuesAggregatorBenchmark.run    BytesRef     10000  avgt    7  192.898 ± 1.781  ms/op
```
@dnhatn dnhatn added the v8.19.0 label Jun 5, 2025
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jun 5, 2025
Currently, time-series aggregations use the `values` aggregation to
collect dimension values. While we might introduce a specialized
aggregation for this in the future, for now, we are using `values`, and
the inputs are likely ordinal blocks. This change speeds up the `values`
aggregation when the inputs are ordinal-based.

Execution time reduced from 461ms to 192ms for 1000 groups.

```
ValuesAggregatorBenchmark.run    BytesRef     10000  avgt    7  461.938 ± 6.089  ms/op
ValuesAggregatorBenchmark.run    BytesRef     10000  avgt    7  192.898 ± 1.781  ms/op
```
dnhatn added a commit that referenced this pull request Jun 6, 2025
Currently, time-series aggregations use the `values` aggregation to
collect dimension values. While we might introduce a specialized
aggregation for this in the future, for now, we are using `values`, and
the inputs are likely ordinal blocks. This change speeds up the `values`
aggregation when the inputs are ordinal-based.

Execution time reduced from 461ms to 192ms for 1000 groups.

```
ValuesAggregatorBenchmark.run    BytesRef     10000  avgt    7  461.938 ± 6.089  ms/op
ValuesAggregatorBenchmark.run    BytesRef     10000  avgt    7  192.898 ± 1.781  ms/op
```
dnhatn added a commit that referenced this pull request Jul 21, 2025
Similar to #127849, this change adds an optimized path for leveraging 
ordinal blocks of intermediate input pages in the Values aggregator.
Below are the micro-benchmark results.

Before:
```
// 1 raw input page + 1000 intermediate input pages
Benchmark                      (dataType)  (groups)  Mode  Cnt       Score   Error  Units
ValuesAggregatorBenchmark.run    BytesRef         1  avgt    2       0.382          ms/op
ValuesAggregatorBenchmark.run    BytesRef      1000  avgt    2     112.293          ms/op
ValuesAggregatorBenchmark.run    BytesRef   1000000  avgt    2  113182.908          ms/op
```

```
After:
// 1 raw input page + 1000 intermediate input pages
Benchmark                      (dataType)  (groups)  Mode  Cnt      Score   Error  Units
ValuesAggregatorBenchmark.run    BytesRef         1  avgt    2      0.378          ms/op
ValuesAggregatorBenchmark.run    BytesRef      1000  avgt    2     34.410          ms/op
ValuesAggregatorBenchmark.run    BytesRef   1000000  avgt    2  64654.830          ms/op
```
1K groups:  112 ms -> 34.4ms
1M groups:     113s -> 64s

More to come with #130510

Relates #127849
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants