-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Esql skip null metrics #133087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Esql skip null metrics #133087
Changes from 36 commits
44eabdc
cb9ebe7
201014a
989d6ce
cbfbbbb
7b620c4
abf4d35
9c9047f
641fec9
7edbf66
a9fa160
0932461
0451b10
540bbdc
a222e27
96fef3f
7d773cb
3cb8bf8
2bf80cd
f255066
9eb5544
190a420
8b77bd8
4e0a8a0
4ea5f18
96161bd
a1b0298
e57204c
5e64278
62bbf94
2f9093c
484b171
7f3f934
a7e9dd8
fc00a2e
dffc121
db91bc8
db14fe4
9db9af5
79f3202
7462fd8
4eb33cb
a1b5454
48e7dbe
108d7d8
3b08b8d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
pr: 133087 | ||
summary: Esql skip null metrics | ||
area: ES|QL | ||
type: enhancement | ||
issues: | ||
- 129524 |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey, I noticed that the local logical optimizer rules have This is overlapping a bit with what we're implementing here. The limitation is that I'm not trying to imply that we should make There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another thing: I think the same can happen here. If we filter out documents where all the metrics are missing but group by a non-metric field, we might remove a whole group from the output. This would be inconsistent with how There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe this is intended, but @kkrik-es can confirm. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removing groups containing only null values makes sense for time-series, indeed, as grouping attributes (dimensions) are included in documents along with metric values. This is very interesting, @alex-spies. I wonder if we should be piggy-backing on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we're going to have two rules, I think it makes sense to modify th e There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a known issue with this option. Groups without values will be omitted, and having two stats may return different groups than having one stat, even a single stat can return different groups than another. However, unlike FROM, which is document-centric, TS is metric-centric, and we are okay with this semantic. We should document this behavior in TS. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think piggy-backing and having an own rule both works. Even when piggy-backing, the separate logic between TS and non-TS queries can be made very clear in the code, so I have no issues with either approach. If we go with two rules, I agree with @not-napoleon that we better adjust |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
/* | ||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
* or more contributor license agreements. Licensed under the Elastic License | ||
* 2.0; you may not use this file except in compliance with the Elastic License | ||
* 2.0. | ||
*/ | ||
|
||
package org.elasticsearch.xpack.esql.optimizer.rules.logical; | ||
|
||
import org.elasticsearch.index.IndexMode; | ||
import org.elasticsearch.xpack.esql.core.expression.Attribute; | ||
import org.elasticsearch.xpack.esql.core.expression.Expression; | ||
import org.elasticsearch.xpack.esql.core.tree.Source; | ||
import org.elasticsearch.xpack.esql.expression.predicate.logical.Or; | ||
import org.elasticsearch.xpack.esql.expression.predicate.nulls.IsNotNull; | ||
import org.elasticsearch.xpack.esql.plan.logical.EsRelation; | ||
import org.elasticsearch.xpack.esql.plan.logical.Filter; | ||
import org.elasticsearch.xpack.esql.plan.logical.LogicalPlan; | ||
import org.elasticsearch.xpack.esql.plan.logical.TimeSeriesAggregate; | ||
import org.elasticsearch.xpack.esql.plan.logical.UnresolvedRelation; | ||
import org.elasticsearch.xpack.esql.rule.Rule; | ||
|
||
import java.util.HashSet; | ||
import java.util.Set; | ||
|
||
/** | ||
* TSDB often ends up storing many null values for metrics columns (since not every time series contains every metric). However, loading | ||
* many null values can negatively impact query performance. To reduce that, this rule applies filters to remove null values on all | ||
* metrics involved in the query. In the case that there are multiple metrics, the not null checks are OR'd together, so we accept rows | ||
* where any of the metrics have values. | ||
*/ | ||
public final class IgnoreNullMetrics extends Rule<LogicalPlan, LogicalPlan> { | ||
|
||
@Override | ||
public LogicalPlan apply(LogicalPlan logicalPlan) { | ||
return logicalPlan.transformUp(TimeSeriesAggregate.class, agg -> { | ||
|
||
Set<Attribute> metrics = new HashSet<>(); | ||
agg.forEachExpression(Attribute.class, attr -> { | ||
if (attr.isMetric()) { | ||
metrics.add(attr); | ||
} | ||
}); | ||
if (metrics.isEmpty()) { | ||
return agg; | ||
} | ||
Expression conditional = null; | ||
for (Attribute metric : metrics) { | ||
// Create an is not null check for each metric | ||
if (conditional == null) { | ||
conditional = new IsNotNull(logicalPlan.source(), metric); | ||
} else { | ||
// Join the is not null checks with OR nodes | ||
conditional = new Or(logicalPlan.source(), conditional, new IsNotNull(Source.EMPTY, metric)); | ||
|
||
} | ||
} | ||
Expression finalConditional = conditional; | ||
return agg.transformUp(p -> isMetricsQuery((LogicalPlan) p), p -> new Filter(p.source(), p, finalConditional)); | ||
alex-spies marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
}); | ||
} | ||
|
||
/** | ||
* Scans the given {@link LogicalPlan} to see if it is a "metrics mode" query | ||
*/ | ||
private static boolean isMetricsQuery(LogicalPlan logicalPlan) { | ||
if (logicalPlan instanceof EsRelation r) { | ||
return r.indexMode() == IndexMode.TIME_SERIES; | ||
} | ||
if (logicalPlan instanceof UnresolvedRelation r) { | ||
return r.indexMode() == IndexMode.TIME_SERIES; | ||
} | ||
return false; | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,206 @@ | ||
/* | ||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
* or more contributor license agreements. Licensed under the Elastic License | ||
* 2.0; you may not use this file except in compliance with the Elastic License | ||
* 2.0. | ||
*/ | ||
|
||
package org.elasticsearch.xpack.esql.optimizer.rules.logical; | ||
|
||
import org.elasticsearch.index.IndexMode; | ||
import org.elasticsearch.test.ESTestCase; | ||
import org.elasticsearch.xpack.esql.EsqlTestUtils; | ||
import org.elasticsearch.xpack.esql.analysis.Analyzer; | ||
import org.elasticsearch.xpack.esql.analysis.AnalyzerContext; | ||
import org.elasticsearch.xpack.esql.analysis.AnalyzerTestUtils; | ||
import org.elasticsearch.xpack.esql.analysis.EnrichResolution; | ||
import org.elasticsearch.xpack.esql.core.expression.FieldAttribute; | ||
import org.elasticsearch.xpack.esql.core.type.DataType; | ||
import org.elasticsearch.xpack.esql.core.type.EsField; | ||
import org.elasticsearch.xpack.esql.expression.function.EsqlFunctionRegistry; | ||
import org.elasticsearch.xpack.esql.expression.predicate.logical.Or; | ||
import org.elasticsearch.xpack.esql.expression.predicate.nulls.IsNotNull; | ||
import org.elasticsearch.xpack.esql.index.EsIndex; | ||
import org.elasticsearch.xpack.esql.index.IndexResolution; | ||
import org.elasticsearch.xpack.esql.optimizer.LogicalOptimizerContext; | ||
import org.elasticsearch.xpack.esql.optimizer.LogicalPlanOptimizer; | ||
import org.elasticsearch.xpack.esql.parser.EsqlParser; | ||
import org.elasticsearch.xpack.esql.plan.logical.Aggregate; | ||
import org.elasticsearch.xpack.esql.plan.logical.EsRelation; | ||
import org.elasticsearch.xpack.esql.plan.logical.Eval; | ||
import org.elasticsearch.xpack.esql.plan.logical.Filter; | ||
import org.elasticsearch.xpack.esql.plan.logical.Limit; | ||
import org.elasticsearch.xpack.esql.plan.logical.LogicalPlan; | ||
|
||
import java.util.Map; | ||
|
||
import static org.elasticsearch.xpack.esql.EsqlTestUtils.TEST_VERIFIER; | ||
import static org.elasticsearch.xpack.esql.EsqlTestUtils.as; | ||
import static org.elasticsearch.xpack.esql.EsqlTestUtils.emptyInferenceResolution; | ||
import static org.elasticsearch.xpack.esql.EsqlTestUtils.unboundLogicalOptimizerContext; | ||
import static org.elasticsearch.xpack.esql.analysis.AnalyzerTestUtils.defaultLookupResolution; | ||
|
||
/** | ||
* Tests for the {@link IgnoreNullMetrics} transformation rule. Like most rule tests, this runs the entire analysis chain. | ||
*/ | ||
public class IgnoreNullMetricsTests extends ESTestCase { | ||
|
||
private Analyzer analyzer; | ||
|
||
private LogicalPlan analyze(String query) { | ||
EsqlParser parser = new EsqlParser(); | ||
|
||
EnrichResolution enrichResolution = new EnrichResolution(); | ||
AnalyzerTestUtils.loadEnrichPolicyResolution(enrichResolution, "languages_idx", "id", "languages_idx", "mapping-languages.json"); | ||
LogicalOptimizerContext logicalOptimizerCtx = unboundLogicalOptimizerContext(); | ||
LogicalPlanOptimizer logicalOptimizer = new LogicalPlanOptimizer(logicalOptimizerCtx); | ||
|
||
Map<String, EsField> mapping = Map.of( | ||
"dimension_1", | ||
new EsField("dimension_1", DataType.KEYWORD, Map.of(), true, EsField.TimeSeriesFieldType.DIMENSION), | ||
"dimension_2", | ||
new EsField("dimension_2", DataType.KEYWORD, Map.of(), true, EsField.TimeSeriesFieldType.DIMENSION), | ||
"metric_1", | ||
new EsField("metric_1", DataType.LONG, Map.of(), true, EsField.TimeSeriesFieldType.METRIC), | ||
"metric_2", | ||
new EsField("metric_2", DataType.LONG, Map.of(), true, EsField.TimeSeriesFieldType.METRIC), | ||
"@timestamp", | ||
new EsField("@timestamp", DataType.DATETIME, Map.of(), true, EsField.TimeSeriesFieldType.NONE), | ||
"_tsid", | ||
new EsField("_tsid", DataType.TSID_DATA_TYPE, Map.of(), true, EsField.TimeSeriesFieldType.NONE) | ||
); | ||
EsIndex test = new EsIndex("test", mapping, Map.of("test", IndexMode.TIME_SERIES)); | ||
IndexResolution getIndexResult = IndexResolution.valid(test); | ||
analyzer = new Analyzer( | ||
new AnalyzerContext( | ||
EsqlTestUtils.TEST_CFG, | ||
new EsqlFunctionRegistry(), | ||
getIndexResult, | ||
defaultLookupResolution(), | ||
enrichResolution, | ||
emptyInferenceResolution() | ||
), | ||
TEST_VERIFIER | ||
); | ||
|
||
return logicalOptimizer.optimize(analyzer.analyze(parser.createStatement(query, EsqlTestUtils.TEST_CFG))); | ||
} | ||
|
||
public void testSimple() { | ||
LogicalPlan actual = analyze(""" | ||
TS test | ||
| STATS max(max_over_time(metric_1)) | ||
| LIMIT 10 | ||
"""); | ||
Limit limit = as(actual, Limit.class); | ||
Aggregate agg = as(limit.child(), Aggregate.class); | ||
// The optimizer expands the STATS out into two STATS steps | ||
Aggregate tsAgg = as(agg.child(), Aggregate.class); | ||
Filter filter = as(tsAgg.child(), Filter.class); | ||
IsNotNull condition = as(filter.condition(), IsNotNull.class); | ||
FieldAttribute attribute = as(condition.field(), FieldAttribute.class); | ||
assertEquals("metric_1", attribute.fieldName().string()); | ||
} | ||
|
||
public void testRuleDoesNotApplyInNonTSMode() { | ||
LogicalPlan actual = analyze(""" | ||
FROM test | ||
| STATS max(metric_1) | ||
| LIMIT 10 | ||
"""); | ||
Limit limit = as(actual, Limit.class); | ||
Aggregate agg = as(limit.child(), Aggregate.class); | ||
EsRelation relation = as(agg.child(), EsRelation.class); | ||
} | ||
|
||
public void testDimensionsAreNotFiltered() { | ||
|
||
LogicalPlan actual = analyze(""" | ||
TS test | ||
| STATS max(max_over_time(metric_1)) BY dimension_1 | ||
| LIMIT 10 | ||
"""); | ||
Limit limit = as(actual, Limit.class); | ||
Aggregate agg = as(limit.child(), Aggregate.class); | ||
// The optimizer expands the STATS out into two STATS steps | ||
Aggregate tsAgg = as(agg.child(), Aggregate.class); | ||
Filter filter = as(tsAgg.child(), Filter.class); | ||
IsNotNull condition = as(filter.condition(), IsNotNull.class); | ||
FieldAttribute attribute = as(condition.field(), FieldAttribute.class); | ||
assertEquals("metric_1", attribute.fieldName().string()); | ||
} | ||
|
||
public void testFiltersAreJoinedWithOr() { | ||
|
||
LogicalPlan actual = analyze(""" | ||
TS test | ||
| STATS max(max_over_time(metric_1)), min(min_over_time(metric_2)) | ||
| LIMIT 10 | ||
"""); | ||
Limit limit = as(actual, Limit.class); | ||
Aggregate agg = as(limit.child(), Aggregate.class); | ||
// The optimizer expands the STATS out into two STATS steps | ||
Aggregate tsAgg = as(agg.child(), Aggregate.class); | ||
Filter filter = as(tsAgg.child(), Filter.class); | ||
Or or = as(filter.expressions().getFirst(), Or.class); | ||
|
||
// For reasons beyond my comprehension, the ordering of the conditionals inside the OR is nondeterministic. | ||
IsNotNull condition; | ||
FieldAttribute attribute; | ||
|
||
condition = as(or.left(), IsNotNull.class); | ||
attribute = as(condition.field(), FieldAttribute.class); | ||
if (attribute.fieldName().string().equals("metric_1")) { | ||
condition = as(or.right(), IsNotNull.class); | ||
attribute = as(condition.field(), FieldAttribute.class); | ||
assertEquals("metric_2", attribute.fieldName().string()); | ||
} else if (attribute.fieldName().string().equals("metric_2")) { | ||
condition = as(or.right(), IsNotNull.class); | ||
attribute = as(condition.field(), FieldAttribute.class); | ||
assertEquals("metric_1", attribute.fieldName().string()); | ||
} else { | ||
// something weird happened | ||
assert false; | ||
} | ||
} | ||
|
||
public void testSkipCoalescedMetrics() { | ||
// Note: this test is passing because the reference attribute metric_2 in the stats block does not inherit the | ||
// metric property from the original field. | ||
LogicalPlan actual = analyze(""" | ||
TS test | ||
| EVAL metric_2 = coalesce(metric_2, 0) | ||
| STATS max(max_over_time(metric_1)), min(min_over_time(metric_2)) | ||
| LIMIT 10 | ||
"""); | ||
Limit limit = as(actual, Limit.class); | ||
Aggregate agg = as(limit.child(), Aggregate.class); | ||
// The optimizer expands the STATS out into two STATS steps | ||
Aggregate tsAgg = as(agg.child(), Aggregate.class); | ||
Eval eval = as(tsAgg.child(), Eval.class); | ||
Filter filter = as(eval.child(), Filter.class); | ||
IsNotNull condition = as(filter.condition(), IsNotNull.class); | ||
FieldAttribute attribute = as(condition.field(), FieldAttribute.class); | ||
assertEquals("metric_1", attribute.fieldName().string()); | ||
} | ||
|
||
/** | ||
* check that stats blocks after the first are not sourced for adding metrics to the filter | ||
*/ | ||
public void testMultipleStats() { | ||
LogicalPlan actual = analyze(""" | ||
TS test | ||
| STATS m = max(max_over_time(metric_1)) | ||
| STATS sum(m) | ||
| LIMIT 10 | ||
"""); | ||
Limit limit = as(actual, Limit.class); | ||
Aggregate sumAgg = as(limit.child(), Aggregate.class); | ||
Aggregate outerAgg = as(sumAgg.child(), Aggregate.class); | ||
Aggregate tsAgg = as(outerAgg.child(), Aggregate.class); | ||
Filter filter = as(tsAgg.child(), Filter.class); | ||
IsNotNull condition = as(filter.condition(), IsNotNull.class); | ||
FieldAttribute attribute = as(condition.field(), FieldAttribute.class); | ||
assertEquals("metric_1", attribute.fieldName().string()); | ||
|
||
} | ||
} |
Uh oh!
There was an error while loading. Please reload this page.