Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented May 20, 2025

Backport #127949 to 9.0

Currently, if a field has high cardinality, we may mistakenly disable emitting ordinal blocks. For example, with 10,000 tsid values, we never emit ordinal blocks during reads, even though we could emit blocks for 10 tsid values across 1,000 positions. This bug disables optimizations for value aggregation and block hashing.

This change tracks the minimum and maximum seen ordinals and uses them as an estimate for the number of ordinals. However, if a page contains ord=1 and ord=9999, ordinal blocks still won't be emitted. Allocating a bitset or an array for value_count could track this more accurately but would require additional memory. I need to think about this trade off more before opening another PR to fix this issue completely.

@dnhatn dnhatn added backport auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Analytics/ES|QL AKA ESQL v9.0.2 labels May 20, 2025
Currently, if a field has high cardinality, we may mistakenly disable
emitting ordinal blocks. For example, with 10,000 `tsid` values, we
never emit ordinal blocks during reads, even though we could emit blocks
for 10 `tsid` values across 1,000 positions. This bug disables
optimizations for value aggregation and block hashing.

This change tracks the minimum and maximum seen ordinals and uses them
as an estimate for the number of ordinals. However, if a page contains
`ord=1` and `ord=9999`, ordinal blocks still won't be emitted.
Allocating a bitset or an array for `value_count` could track this more
accurately but would require additional memory. I need to think about
this trade off more before opening another PR to fix this issue
completely.

This is a quick, contained fix that significantly speeds up time-series
aggregation (and other queries too).

The execution time of this query is reduced from 3.4s to 1.9s with 11M documents.

```
POST /_query
{
    "profile": true,
    "query": "TS metrics-hostmetricsreceiver.otel-default
            | STATS cpu = avg(avg_over_time(`metrics.system.cpu.load_average.1m`)) BY host.name, BUCKET(@timestamp, 5 minute)"
}
```

```
"took": 3475,
"is_partial": false,
"documents_found": 11368089,
"values_loaded": 34248167
```

```
"took": 1965,
"is_partial": false,
"documents_found": 11368089,
"values_loaded": 34248167
```
@elasticsearchmachine elasticsearchmachine merged commit 8642180 into elastic:9.0 May 20, 2025
16 checks passed
@dnhatn dnhatn deleted the 9.0-ordinals branch May 20, 2025 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport v9.0.2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants