Esql skip null metrics #133087

not-napoleon · 2025-08-18T19:37:39Z

This is meant to add a rewrite rule to filter out null metrics. I'm opening this PR early to collect feedback from the Analytics Engine team on the approach.

This rule scans the query plan to collect all of the metric attributes, creates an isNotNull expression for each, and then combines them into a single filter. For the initial version of this, we want to process any document that has any of the metrics in question, so we OR the filters together.

Feature Design Questions:

Should we apply this filter even if the user has other filters or logic dealing with the given metric field? e.g. if they have a COALESCE for that field already in the query?
- At this point, the rule only collects metrics from STATS commands. So if a field is coalesced and then we compute a statistic on that result, no metric will be collected for it. This is a little fragile as written here, but it is working and it has tests.
Is it correct to be OR'ing the filters together?
- This seems correct. We want all documents that have a metric value for any of the metrics involved in the query.

Implementation Questions:

Where is the correct place in the query planning process to apply this rule? My instinct is that it should run in the "Finish Analysis" phase of the "Analyzer" step. As written it should only run once, and it seems like it should run after references and union types have been resolved.
- Resolution: I discussed this with Fang, and we decided it was best placed in the substitutions phase of the logical plan optimizer.

…l-metrics' into esql-skip-null-metrics

not-napoleon · 2025-08-21T18:55:28Z

Where is the correct place in the query planning process to apply this rule? My instinct is that it should run in the "Finish Analysis" phase of the "Analyzer" step. As written it should only run once, and it seems like it should run after references and union types have been resolved.

After discussing this with Fang, we agreed that the correct place for this is in the substitutions step of the logical optimizer phase. Other rules that should only be run once and add nodes happen in that phase, so it seems like a good fit.

kkrik-es · 2025-08-22T07:32:09Z

@dnhatn fyi

alex-spies

I took a look at IgnoreNullMetrics. I added some thoughts on weird situations that may arise, but I don't see a risk for IgnoreNullMetrics affecting existing queries that are non-TS queries, due to requiring a timeseries EsRelation which I think only ever occurs when the TS command is explicitly used.

I can't comment on whether or not it's correct/desired to filter out null metrics like this - my review is limited to "does this maybe affect other queries", which would require deeper thought and review before merging. Don't see this being the case here.

...sql-core/src/main/java/org/elasticsearch/xpack/esql/core/expression/UnresolvedAttribute.java

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/index/EsIndex.java

...ql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/IgnoreNullMetrics.java

alex-spies · 2025-09-03T10:29:37Z

...ql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/IgnoreNullMetrics.java

+
+    private Set<Attribute> collectMetrics(LogicalPlan logicalPlan) {
+        Set<Attribute> metrics = new HashSet<>();
+        logicalPlan.forEachDown(p -> {


Wouldn't it suffice to specifically look for EsRelation's, in a single pass, and just iterate over its met

I guess you need to first confirm that the metric field is used in a STATS; maybe it's better to look for STATS first, and then traverse only this specific STATs' children.

I'd be a little afraid of multiple STATS in a row; should this only apply to the first STATS?

Also, there can be multiple EsRelations in a single plan. This happens in case of joins, and in case of forks. Collecting metrics from all the plans and then placing a filter involving all the metrics fields ahead of every relation sounds brittle; I'd go and look at some LOOKUP JOIN and FORK queries with the debugger on to double check this.

I'd be a little afraid of multiple STATS in a row; should this only apply to the first STATS?

Subsequent STATS would have to be operating on the output of the first STATS. Since the first STATS output will be ReferenceAttributes, which always return false for isMetric(), none of those values should be caught in this filter. I will add some tests for this to be sure, and to guard against future changes to ReferenceAttribute

okay, I've added a test that includes a second stats command. Please let me know if there's anything else you'd like to see there.

Since the first STATS output will be ReferenceAttributes

Caution: I think we pass through the FieldAttribute if the field is used in the BY clause (maybe only in the case without renaming). I see that all tests don't have a BY clause; is that intentional, i.e., do we not expect to group by any dimensions? Or non-dimensions, actually?

Otherwise, I'd add a couple more tests using the BY clause for good measure, and I'd go and double check that something like STATS ... BY field | ... | STATS ... BY field or STATS ... BY FIELD | ... | STATS min(field) still works as expected.

alex-spies · 2025-09-03T10:31:44Z

...ql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/IgnoreNullMetrics.java

+    private Set<Attribute> collectMetrics(LogicalPlan logicalPlan) {
+        Set<Attribute> metrics = new HashSet<>();
+        logicalPlan.forEachDown(p -> {
+            if (p instanceof Aggregate) {


Do we know how this will affect INLINE STATS?

I do not know how this would affect INLINE STATS. Do we have any plan for how INLINE STATS will interact with the TS command in general?

INLINE_STATS shouldn't be placed between TS and the first STATS, which is polymorphic. @dnhatn to correct me as needed.

I think in the future we might consider making INLINE STATS work with TS, or forbidding it. You can check the TimeSeriesAggregate instead of Aggregate instance here to narrow the scope of these changes.

…l-metrics' into esql-skip-null-metrics

dnhatn

I've left some comments, but looks good overall. Thank you, Mark!

dnhatn · 2025-09-03T22:41:57Z

...ql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/IgnoreNullMetrics.java

+ * metrics involved in the query.  In the case that there are multiple metrics, the not null checks are OR'd together, so we accept rows
+ * where any of the metrics have values.
+ */
+public final class IgnoreNullMetrics extends Rule<LogicalPlan, LogicalPlan> {


Can we move this to a logical local rule? I'm asking because we might switch to using HashJoin instead of this for semantic and performance reasons. With local logical rules, we don't need to worry much about BWC when deciding to change the execution method.

I think placing this in the local logical optimizer's Local rewrite batch would be consistent. We'd also have search stats available if that helps.

Can we move this to a logical local rule?

Yes, I will work on this today.

With local logical rules, we don't need to worry much about BWC when deciding to change the execution method.

Can you elaborate on this? I don't see why one or the other place would impact backwards compatibility.

With global rules, a problem is that if data nodes later rely on a specific optimization to already have taken place, removing or changing this optimization in the coordinator is hard, because the coordinator would have to send a different plan to old nodes. We haven't solved this problem, yet, even though we'll have to sometime, soon.

Actually, there may be a small bwc issue here: if we move this to a local rule, then old nodes will not know about it and will continue sending unfiltered data; which is generally fine except for the edge case where there are groups with no metrics - those will be completely removed from new nodes, but will still be sent from old nodes.

I guess it's fine because we don't really care about these groups, but wanted to highlight it in case this has some consequences for anything you folks are building.

There are no "old nodes" yet. TS is unreleased, so the first released version will include this rule. Any nodes that do not support this rule, will also not support the TS command, and the query will fail anyway.

...ql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/IgnoreNullMetrics.java

dnhatn · 2025-09-03T22:47:28Z

...ql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/IgnoreNullMetrics.java

+                    conditional = new IsNotNull(logicalPlan.source(), metric);
+                } else {
+                    // Join the is not null checks with OR nodes
+                    conditional = new Or(logicalPlan.source(), conditional, new IsNotNull(Source.EMPTY, metric));


I think we should either use Source.EMPTY and logicalPlan.source() for all clauses.

dnhatn · 2025-09-03T23:31:29Z

...ql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/IgnoreNullMetrics.java

+public final class IgnoreNullMetrics extends Rule<LogicalPlan, LogicalPlan> {
+    @Override
+    public LogicalPlan apply(LogicalPlan logicalPlan) {
+        return logicalPlan.transformUp(TimeSeriesAggregate.class, agg -> {


We can type check here instead of in transformUp, which executes another loop.

if (logicalPlan instanceof TimeSeriesAggregate) { //... } else { return logicalPlan; }

alex-spies · 2025-09-04T08:35:16Z

...ql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/IgnoreNullMetrics.java

+
+    private Set<Attribute> collectMetrics(LogicalPlan logicalPlan) {
+        Set<Attribute> metrics = new HashSet<>();
+        logicalPlan.forEachDown(p -> {


Since the first STATS output will be ReferenceAttributes

Caution: I think we pass through the FieldAttribute if the field is used in the BY clause (maybe only in the case without renaming). I see that all tests don't have a BY clause; is that intentional, i.e., do we not expect to group by any dimensions? Or non-dimensions, actually?

Otherwise, I'd add a couple more tests using the BY clause for good measure, and I'd go and double check that something like STATS ... BY field | ... | STATS ... BY field or STATS ... BY FIELD | ... | STATS min(field) still works as expected.

...sql-core/src/main/java/org/elasticsearch/xpack/esql/core/expression/UnresolvedAttribute.java

alex-spies · 2025-09-04T08:45:51Z

...c/test/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/IgnoreNullMetricsTests.java

+    private LogicalPlan analyze(String query) {
+        EsqlParser parser = new EsqlParser();


Note: in the future, if you want to save some time when adding new tests, you can inherit from AbstractLogicalPlanOptimizerTests. You'll have all the test helpers that the logical optimizer tests already have.

Yeah, I'm using that in another PR. I still need to load a schema that fits the tests. I can rework this test to inherit from that as well though.

Actually, looking at it, LocalLogicalPlanOptimizerTests extends directly from ESTestCase. If I'm switching this to be a local optimization (as per https://github.com/elastic/elasticsearch/pull/133087/files#r2320394569), should I follow that pattern instead?

Gah, of course the local optimizer tests are structured inconsistently. Sorry about that! Use whatever base class works best for you.

alex-spies · 2025-09-04T08:48:18Z

...ql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/IgnoreNullMetrics.java

Hey, I noticed that the local logical optimizer rules have InferNonNullAggConstraint, which prepends a WHERE field IS NOT NULL OR other_field IS NOT NULL to a STATS min(field), max(other_field).

This is overlapping a bit with what we're implementing here. The limitation is that InferNonNullAggConstraint is not used with a BY clause. It also places the filter directly ahead of the STATS, relying on filter pushdown to make the filter Lucene-pushable.

I'm not trying to imply that we should make InferNonNullAggConstraint handle TS cases, too. I'm fine with having a more specific rule in place. I'm saying this because the two rules may interfere with one another, and you may want to add local logical plan optimizer tests and/or local physical plan optimizer tests to ensure that you get the filter properly pushed down to Lucene. That's not really visible from the tests added in IgnoreNullMetricsTests. In fact, it may be that the added test cases are already covered by InferNonNullAggConstraint but we don't see it because this requires local tests to run.

alex-spies · 2025-09-04T08:54:41Z

...ql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/IgnoreNullMetrics.java

Another thing: InferNonNullAggConstraint doesn't apply when there is a BY clause. That's because it would filter out some groups that have only null values.

I think the same can happen here. If we filter out documents where all the metrics are missing but group by a non-metric field, we might remove a whole group from the output. This would be inconsistent with how STATS normally works. I don't know if semantically correct for TS.

I believe this is intended, but @kkrik-es can confirm.

Removing groups containing only null values makes sense for time-series, indeed, as grouping attributes (dimensions) are included in documents along with metric values.

This is very interesting, @alex-spies. I wonder if we should be piggy-backing on InferNonNullAggConstraint and apply it under TS even in the presence of grouping attributes, instead of introducing a new rule. You def know better which one is cleaner.

If we're going to have two rules, I think it makes sense to modify th eInferNonNullAggConstraint rule to not apply to TimeSeriesAggregation nodes. We probably still want it to apply to any later aggregations.

This is a known issue with this option. Groups without values will be omitted, and having two stats may return different groups than having one stat, even a single stat can return different groups than another. However, unlike FROM, which is document-centric, TS is metric-centric, and we are okay with this semantic. We should document this behavior in TS.

This is very interesting, @alex-spies. I wonder if we should be piggy-backing on InferNonNullAggConstraint and apply it under TS even in the presence of grouping attributes, instead of introducing a new rule. You def know better which one is cleaner.

I think piggy-backing and having an own rule both works. Even when piggy-backing, the separate logic between TS and non-TS queries can be made very clear in the code, so I have no issues with either approach.

If we go with two rules, I agree with @not-napoleon that we better adjust InferNonNullAggConstraint to not apply to TS queries, otherwise we'll have 2 rules doing similar work at the same time - which cannot be good when evolving and/or debugging TS.

alex-spies · 2025-09-04T08:56:13Z

...ql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/IgnoreNullMetrics.java

+ * metrics involved in the query.  In the case that there are multiple metrics, the not null checks are OR'd together, so we accept rows
+ * where any of the metrics have values.
+ */
+public final class IgnoreNullMetrics extends Rule<LogicalPlan, LogicalPlan> {


I think placing this in the local logical optimizer's Local rewrite batch would be consistent. We'd also have search stats available if that helps.

...ql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/IgnoreNullMetrics.java

dnhatn

Looks great, thanks Mark! Sorry for the delay - we needed to discuss on the semantic issues.

dnhatn · 2025-09-04T21:30:34Z

.../main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/local/IgnoreNullMetrics.java

+    /**
+     * Scans the given {@link LogicalPlan} to see if it is a "metrics mode" query
+     */
+    private static boolean isMetricsQuery(LogicalPlan logicalPlan) {


nit: I think it's unused.

alex-spies

Looks good!

@not-napoleon , as you suggested, I think it's better to adjust InferNonNullAggConstraint in a follow-up PR so that you don't end up having both rules doing similar things and maybe messing up your planning later down the line.

I still recommend adding physical plan optimizer tests, too, or some other means to see the filter pushdown actually happening that I assume this PR is going for. Executing the added filter at runtime will likely not do much for performance, I think.

…l-metrics' into esql-skip-null-metrics

not-napoleon · 2025-09-05T14:12:08Z

I've added the constraint as discussed. In point of fact, I think it is redundant, because the time series aggregation is always grouped, and the InferNotNullAggConstraint does not apply on grouped aggs (which is why we didn't see any problems in the tests). That said, explicit is better than implicit, and the check is trivial.

I did not add an explicit test for the pushdown. I think that's worth doing, but I want to do it in a follow up (I did look at doing it here, and unfortunately writing such a test is non-trivial.) There are a lot of other people working on this code, and I would like to get this functionality to them ASAP, and it is reasonably trustworthy that the isNotNull filter this adds will be pushed down as they normally are. Prior manual testing indicated that adding these filters dramatically sped up TS queries, which gives a lot of confidence that the push down is working correctly. I will open a follow up ticket to add the tests, but I do not think it's high priority.

Thank you @alex-spies and @dnhatn for your help getting this important optimization merged.

@dnhatn

Relates to #133087 @dnhatn reported that we weren't seeing the performance improvement we expected from the null filters. I added tests and investigated, and it turns out I had forgotten to load the TS metadata from a single filed caps response. This PR includes the tests that help find the problem and the fix.

@dnhatn

Relates to elastic#133087 @dnhatn reported that we weren't seeing the performance improvement we expected from the null filters. I added tests and investigated, and it turns out I had forgotten to load the TS metadata from a single filed caps response. This PR includes the tests that help find the problem and the fix.

@dnhatn

Relates to elastic#133087 @dnhatn reported that we weren't seeing the performance improvement we expected from the null filters. I added tests and investigated, and it turns out I had forgotten to load the TS metadata from a single filed caps response. This PR includes the tests that help find the problem and the fix.

not-napoleon added 3 commits August 18, 2025 13:28

initial outline of rule

44eabdc

add ability to know if an attribute is a metric

cb9ebe7

actually collect the metrics

201014a

not-napoleon added >enhancement WIP :Analytics/ES|QL AKA ESQL v9.2.0 :StorageEngine/ES|QL Timeseries / metrics / logsdb capabilities in ES|QL labels Aug 18, 2025

not-napoleon and others added 17 commits August 18, 2025 15:40

simplify the lambdas. I think this is more correct

989d6ce

add the rule to the analisys chain

cbfbbbb

[CI] Auto commit changes from spotless

7b620c4

start of tests

abf4d35

Merge remote-tracking branch 'refs/remotes/not-napoleon/esql-skip-nul…

9c9047f

…l-metrics' into esql-skip-null-metrics

[CI] Auto commit changes from spotless

641fec9

remove redundant check

7edbf66

Merge remote-tracking branch 'refs/remotes/not-napoleon/esql-skip-nul…

a9fa160

…l-metrics' into esql-skip-null-metrics

test passes!

0932461

more tests

0451b10

more tests

540bbdc

test non-ts query

a222e27

move the rule to the logical plan optimizer

96fef3f

move the rule to the correct package

7d773cb

Merge branch 'main' into esql-skip-null-metrics

3cb8bf8

[CI] Auto commit changes from spotless

2bf80cd

Merge branch 'main' into esql-skip-null-metrics

f255066

not-napoleon added 2 commits August 22, 2025 10:35

Merge branch 'main' into esql-skip-null-metrics

9eb5544

Merge branch 'main' into esql-skip-null-metrics

190a420

not-napoleon marked this pull request as ready for review August 25, 2025 13:02

alex-spies reviewed Sep 3, 2025

View reviewed changes

not-napoleon and others added 6 commits September 3, 2025 13:31

response to PR feedback

2f9093c

Merge remote-tracking branch 'refs/remotes/not-napoleon/esql-skip-nul…

484b171

…l-metrics' into esql-skip-null-metrics

response to PR feedback

7f3f934

[CI] Auto commit changes from spotless

a7e9dd8

do everything in one transformation pass

fc00a2e

Merge branch 'main' into esql-skip-null-metrics

dffc121

dnhatn reviewed Sep 3, 2025

View reviewed changes

alex-spies reviewed Sep 4, 2025

View reviewed changes

not-napoleon and others added 6 commits September 4, 2025 12:15

Put the filter immediately below (before) the aggregation

db91bc8

use the correct source in the conditional

db14fe4

make IgnoreNullMetrics a local rule

9db9af5

move the rule to the correct package

79f3202

refactor to use OptimizerRules.OptimizerRule

7462fd8

[CI] Auto commit changes from spotless

4eb33cb

dnhatn approved these changes Sep 4, 2025

View reviewed changes

alex-spies approved these changes Sep 5, 2025

View reviewed changes

not-napoleon added 3 commits September 5, 2025 09:16

skip InfernonNullAggConstraint for TS aggs

a1b5454

Merge remote-tracking branch 'refs/remotes/not-napoleon/esql-skip-nul…

48e7dbe

…l-metrics' into esql-skip-null-metrics

Merge branch 'main' into esql-skip-null-metrics

108d7d8

not-napoleon enabled auto-merge (squash) September 5, 2025 14:12

alex-spies approved these changes Sep 8, 2025

View reviewed changes

Merge branch 'main' into esql-skip-null-metrics

3b08b8d

not-napoleon merged commit 2e8fbe9 into elastic:main Sep 8, 2025
33 checks passed

not-napoleon mentioned this pull request Sep 12, 2025

Esql Fix bug with loading TS metadata #134648

Merged

		private LogicalPlan analyze(String query) {
		EsqlParser parser = new EsqlParser();

Esql skip null metrics #133087

Esql skip null metrics #133087

Uh oh!

Conversation

not-napoleon commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

not-napoleon commented Aug 21, 2025

Uh oh!

kkrik-es commented Aug 22, 2025

Uh oh!

alex-spies left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kkrik-es Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

not-napoleon commented Aug 18, 2025 •

edited

Loading

dnhatn Sep 3, 2025 •

edited

Loading

kkrik-es Sep 4, 2025 •

edited

Loading

dnhatn left a comment •

edited

Loading