-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ES|QL] Rewrite RoundTo to QueryAndTags #132512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ES|QL] Rewrite RoundTo to QueryAndTags #132512
Conversation
|
Hi @fang-xing-esql, I've created a changelog YAML for you. |
|
I think the perf numbers look great! It'd be worth picking at the cases that are slower. If we can detect those cases or fix them that'd be great, but it's only the very fast ones that are very fast already that get slower - and they don't get slow. Question - what are the counts? My instinct is that if there aren't many results the overhead of doing query things is higher than just scanning. |
I took a closer look at the 12 days data, with daily interval, 12 buckets, here are the counts @nik9000 . And the comparison of its profile between main and this branch shows main - 501,751 rows, 11,422,916 nanos, there are 5 instances of branch - 393,765 rows, 166,409,495 nanos, there are 25 instances of |
...in/esql/src/main/java/org/elasticsearch/xpack/esql/planner/EsPhysicalOperationProviders.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/EsQueryExec.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/EsQueryExec.java
Outdated
Show resolved
Hide resolved
cd5760b to
5976fbc
Compare
I remeasured this branch after refactoring |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Thanks a lot for reviewing @nik9000 ! The sort of rounding points has been refactored into |
...esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/math/RoundTo.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Worth having a QL expert looking too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since time-series queries without rate are opted into this optimization, I benchmarked the change using the metrics benchmark. It sped up the following query by 10-15%:
TS my*
| WHERE `metrics.system.memory.utilization` IS NOT NULL AND @timestamp >= "2025-07-25T14:55:59.000Z" AND @timestamp <= "2025-07-25T16:25:59.000Z"
| STATS AVG(AVG_OVER_TIME(`metrics.system.memory.utilization`)) BY host.name, BUCKET(@timestamp, 1h)
Great work. Thank you, Fang!
| return null; | ||
| } | ||
| List<Expression> roundingPoints = roundTo.points(); | ||
| int count = roundingPoints.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we avoid pushdown when there are many buckets? Reading values and then bucketing them seems safer and more efficient than executing many small sub-queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we avoid pushdown when there are many buckets? Reading values and then bucketing them seems safer and more efficient than executing many small sub-queries.
I set the upper limit of replacing RoundTo with range queries to 127, the same as the one used in Rounding. The 127 buckets query that I tested run to completion, and it performs better than not pushing down RoundTo. The performance measurements on different number of buckets are below, the performance of range queries look very promising, each bucket has about 400K documents in the queries below, and the number of buckets range from 20 to 127.
The query is like below, we get different number of buckets by adjusting the upper bound of dropoff_datetime
20 buckets
RoundTo without pushdown: 209 ms
RangeQueries: 163 ms 22% faster
40 buckets
RoundTo without pushdown: 404 ms
RangeQueries: 247 ms 39% faster
60 buckets
RoundTo without pushdown: 600 ms
RangeQueries: 323 ms 46% faster
80 buckets
RoundTo without pushdown: 671 ms
RangeQueries: 417 ms. 38% faster
100 buckets
RoundTo without pushdown: 825 ms
RangeQueries: 521 ms 37% faster
127 buckets
RoundTo without pushdown: 1068 ms
RangeQueries: 686 ms 36% faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Fang! I think we can settle on 128 buckets. We could make this a cluster setting to allow lowering it if there is any performance issue, but that's optional and can be done in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dnhatn ! I'd like to make this a cluster setting in this PR, just in case there is performance issue we have a way to protect ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ROUNDTO_PUSHDOWN_THRESHOLD is added in QueryPragmas and EsqlFlags to control the pushdown behavior at query and cluster level. Query level pragma overrides cluster settings if it is not set to default(-1).
I added an entry in EsqlFlags instead of PysicalSettings , as EsqlFlags is already in LocalLogicalPlanOptimizer's context, PhysicalSettings is not accessible to LocalLogicalPlanOptimizer's yet.
@dnhatn Could you help take a look if the added knobs make sense? Thank you!
Thanks for reviewing and measuring time-series queries @dnhatn! I didn't realize that it applies to time-series queries without rate. It is also a good point to put a threshold on the number of buckets. When |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...in/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizer.java
Outdated
Show resolved
Hide resolved
| .filter(RoundTo.class::isInstance) | ||
| .map(RoundTo.class::cast) | ||
| .toList(); | ||
| // It is not clear how to push down multiple RoundTos, dealing with multiple RoundTos is out of the scope of this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is appropriate for the PR itself, not necessarily for the code.
| PhysicalPlan plan = evalExec; | ||
| // TimeSeriesSourceOperator and LuceneTopNSourceOperator do not support QueryAndTags, skip them | ||
| // Lookup join is not supported yet | ||
| if (evalExec.child() instanceof EsQueryExec queryExec && queryExec.canSubstituteRoundToWithQueryBuilderAndTags()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check here EVAL .... -> EsQueryExec is very strict. Is this really the only edge case scenario that this PR is targeting?
What if I have two evals or an eval -> project -> esqueryexec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are limitations, I’d prefer to start with a smaller scope and extend it later as we see new patterns.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/EsQueryExec.java
Show resolved
Hide resolved
|
@fang-xing-esql I also benchmarked this change using LAST_OVER_TIME, which needs both the actual and the rounded timestamp. Unfortunately, this optimization doesn't speed up the query and sometimes makes it slower. Unlike other queries, LAST_OVER_TIME requires the actual timestamp, so the timestamp field still has to be read. Have we considered only pushing down when the input value of the eval not used after the eval? You can check this by changing As I said before, this PR is great and has significantly sped up many important queries. We love it and are eagerly waiting for it to be merged to help with the time-series use cases :). My feedback was just to ensure the optimization doesn't accidentally slow down some specific cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@dnhatn Do you happen to save the profile of this query before and after this change? I'd like to look into them further. Thank you! After regular |
…rol the pushdown behavior at query and cluster level
This PR pushes the
RoundTofunction down to Lucene by rewriting theQueryBuilderinEsQueryExecto multipleQueryBuilders with tags. EachQueryBuilderis responsible for oneRoundTobucket.EsPhysicalOperationProviders.sourcePhysicalOperationbuildsLuceneSliceQueueforLuceneSourceOperatorto process the list ofQueryAndTags.Some major changes to
EsQueryExec:EsQueryExecis removed, as it is created and used by local data nodes only.EsQueryExecthat takeQueryBuilderas input have been removed and replaced by the new one that takes a list ofQueryBuilderAndTags.Examples
This is one of the examples from the description of
ReplaceRoundToWithQueryAndTagsLimitations
RoundTo won't be rewritten to QueryAndTags in the following situations:
LuceneTopNSourceOperatordoes not supportQueryAndTagsTimeSeriesSourceOperatordoes not supportQueryAndTagsFORKin the queryLOOKUP JOINin the queryRoundTos referenced by the queryThere is a cluster setting and a pragma added to disable the rewrite of RoundTo to QueryAndTags
The cluster setting defaults to 127, which means if the number of rounding points is greater than 127,
RoundTois not rewritten toQueryAndTags.Pragma defaults to -1 which means cluster setting takes effects. If the pragma is set to a value that's greater than -1, it overrides cluster setting.
Performance Observations:
The
nyc_taxisdataset is used in the ES|QL performance regression tests related todate_histogramanddate_trunc. The queries discussed below are built on this dataset and vary by calendar interval, date range, and data volume. An example query is shown below:The elapsed times reported are the averages of 10 runs for each query, measured on both the main and this branch. The observation is that for larger data volumes, rewriting (pushing down)
RoundTointoQueryAndTagstends to improve performance. However, for smaller datasets,QueryAndTagscan perform worse thanRoundTo. This is likely becauseQueryAndTagsmay query the same index multiple times. That said, when data volume is small, all queries complete within subsecond range, and the performance difference is generally not quite noticeable.