Skip to content

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Oct 4, 2025

Today, we use a threshold (defaults to 128) to avoid generating too many sub-queries when replacing round_to with sub-queries. However, we do not account for cases where the main query is expensive. In such cases, running many expensive queries is slower and more costly than running a single query and then reading values and rounding. Our benchmark shows that this query takes 800ms with query-and-tags, but only 40ms without it.

TS metric* 
| WHERE host.name LIKE \"host-*\" AND @timestamp >= \"2025-07-25T12:55:59.000Z\" AND @timestamp <= \"2025-07-25T17:25:59.000Z\" 
| STATS AVG(AVG_OVER_TIME(`metrics.system.cpu.load_average.1m`)) BY host.name, TBUCKET(5 minutes)

And this query:

TS new_metrics* 
| WHERE host.name IN("host-0", "host-1", "host-2") AND @timestamp >= "2025-07-25T12:55:59.000Z" AND @timestamp <= "2025-07-25T17:25:59.000Z" 
| STATS AVG(AVG_OVER_TIME(`metrics.system.cpu.load_average.1m`)) BY host.name, TBUCKET(5 minutes)

reduces from 50ms to 10ms.

This change proposes using the threshold as the number of query clauses and assigning higher weights to expensive queries, such as wildcard or prefix queries. This allows us to disable the rewrite when it is less efficient, while still enabling it if the number of sub-queries is small.

I consider this a bug and will backport it to 9.2.1.

return Math.ceilDiv(threshold, clauses);
}

static int estimateQueryClauses(QueryBuilder q) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rough estimate - any suggestions are welcome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

Would we also want to handle leaf query builders that target doc value only fields differently then if it targets an indexed field? I guess if that is the case, then that should be for another change.

Copy link
Member Author

@dnhatn dnhatn Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. We might also need to convert to queries, rewrite them, then estimate.

@dnhatn dnhatn added v9.2.1 :Analytics/ES|QL AKA ESQL >bug auto-backport Automatically create backport pull requests when merged labels Oct 4, 2025
@dnhatn dnhatn marked this pull request as ready for review October 4, 2025 04:08
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 4, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few questions, but this LGTM!

if (q == null || q instanceof MatchAllQueryBuilder || q instanceof MatchNoneQueryBuilder) {
return 0;
}
if (q instanceof WildcardQueryBuilder || q instanceof RegexpQueryBuilder || q instanceof PrefixQueryBuilder) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to add FuzzyQueryBuilder here as well?
Or maybe we should check for MultiTermQueryBuilder? (but that also includes range query builder, which if indexed should count as one, I think?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I added it in b055ae0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the range query too in 2f7fd82

return Math.ceilDiv(threshold, clauses);
}

static int estimateQueryClauses(QueryBuilder q) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

Would we also want to handle leaf query builders that target doc value only fields differently then if it targets an indexed field? I guess if that is the case, then that should be for another change.

}
if (q instanceof MultiTermQueryBuilder) {
return 3;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also score phrase queries differently?

@dnhatn
Copy link
Member Author

dnhatn commented Oct 4, 2025

@martijnvg Thank you so much for the quick review!

@dnhatn dnhatn enabled auto-merge (squash) October 4, 2025 08:25
int clauses = estimateQueryClauses(stats, query) + 1;
if (indexMode == IndexMode.TIME_SERIES) {
// No doc partitioning for time_series sources; increase the threshold to trade overhead for parallelism.
threshold *= 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nit: conside adding constants for these numbers (2, 5 etc).

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

@dnhatn dnhatn merged commit d5ad51a into elastic:main Oct 4, 2025
34 checks passed
@dnhatn dnhatn deleted the round_to_expensive_queries branch October 4, 2025 15:17
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Oct 4, 2025
Today, we use a threshold (defaults to 128) to avoid generating too many 
sub-queries when replacing round_to with sub-queries. However, we do not
account for cases where the main query is expensive. In such cases,
running many expensive queries is slower and more costly than running a
single query and then reading values and rounding. Our benchmark shows
that this query takes 800ms with query-and-tags, but only 40ms without
it.

TS metric* 
| WHERE host.name LIKE \"host-*\" 
  AND @timestamp >= \"2025-07-25T12:55:59.000Z\" AND @timestamp <= \"2025-07-25T17:25:59.000Z\"
| STATS AVG(AVG_OVER_TIME(`metrics.system.cpu.load_average.1m`)) BY host.name, TBUCKET(5 minutes)
And this query:

TS new_metrics* 
| WHERE host.name IN("host-0", "host-1", "host-2") 
  AND @timestamp >= "2025-07-25T12:55:59.000Z" AND @timestamp <= "2025-07-25T17:25:59.000Z"
| STATS AVG(AVG_OVER_TIME(`metrics.system.cpu.load_average.1m`)) BY host.name, TBUCKET(5 minutes)

reduces from 50ms to 10ms.

This change proposes using the threshold as the number of query clauses 
and assigning higher weights to expensive queries, such as wildcard or
prefix queries. This allows us to disable the rewrite when it is less
efficient, while still enabling it if the number of sub-queries is
small.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
9.2

elasticsearchmachine pushed a commit that referenced this pull request Oct 4, 2025
Today, we use a threshold (defaults to 128) to avoid generating too many 
sub-queries when replacing round_to with sub-queries. However, we do not
account for cases where the main query is expensive. In such cases,
running many expensive queries is slower and more costly than running a
single query and then reading values and rounding. Our benchmark shows
that this query takes 800ms with query-and-tags, but only 40ms without
it.

TS metric* 
| WHERE host.name LIKE \"host-*\" 
  AND @timestamp >= \"2025-07-25T12:55:59.000Z\" AND @timestamp <= \"2025-07-25T17:25:59.000Z\"
| STATS AVG(AVG_OVER_TIME(`metrics.system.cpu.load_average.1m`)) BY host.name, TBUCKET(5 minutes)
And this query:

TS new_metrics* 
| WHERE host.name IN("host-0", "host-1", "host-2") 
  AND @timestamp >= "2025-07-25T12:55:59.000Z" AND @timestamp <= "2025-07-25T17:25:59.000Z"
| STATS AVG(AVG_OVER_TIME(`metrics.system.cpu.load_average.1m`)) BY host.name, TBUCKET(5 minutes)

reduces from 50ms to 10ms.

This change proposes using the threshold as the number of query clauses 
and assigning higher weights to expensive queries, such as wildcard or
prefix queries. This allows us to disable the rewrite when it is less
efficient, while still enabling it if the number of sub-queries is
small.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.1 v9.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants