Use query source to detect non-null docs #133287

dnhatn · 2025-08-21T05:12:40Z

One of the slowest parts of time-series queries is reading metric values - it accounts for 30% of the profiler time for the following query:

TS my*
| WHERE `metrics.system.memory.utilization` IS NOT NULL
        AND @timestamp >= "2025-07-25T14:55:59.000Z"
        AND @timestamp <= "2025-07-25T16:25:59.000Z"
| STATS AVG(AVG_OVER_TIME(`metrics.system.memory.utilization`)) BY host.name, BUCKET(@timestamp, 1h)

This is because the metrics.system.memory.utilization field is sparse, requiring iteration over its DISI to find value indices when reading values. This change adds a flag named nullsFiltered to the column reader, signaling that all target docs have values for the field. This enables optimizations such as skipping value index lookups with DISI and performing bulk copying.

We can safely do this because if the filter WHERE metrics.system.memory.utilization IS NOT NULL is pushed down to Lucene, then every document returned from the Lucene operator will have a value for the metrics.system.memory.utilization field.

I was able to make changes that reduce the execution time for reading sparse metric values to be comparable with reading dense fields (like timestamp). To keep this PR small, I will open the codec-related changes in a separate PR.

dnhatn

To the reviewers: except for the comments, most of the changes add a new parameter to the column reader.

dnhatn · 2025-08-21T06:10:37Z

...in/esql/src/main/java/org/elasticsearch/xpack/esql/planner/EsPhysicalOperationProviders.java

+    /**
+     * Returns the set of fields that are guaranteed to be dense after the source query.
+     */
+    static Set<String> nullsFilteredFieldsAfterSourceQuery(QueryBuilder sourceQuery) {


Here, we extract fields from the query that can be considered nullsFiltered when reading values. The nullsFiltered flag passed to each FieldInfo.

Is this done only for TS or for FROM too? For the latter, I wonder if ignoring nulls is desired.

It should work for both, but it will be a no-op. Optimizations can be enabled at the codec level, and only possible with the TSDB codec now.

dnhatn · 2025-08-21T06:13:10Z

.../compute/src/main/java/org/elasticsearch/compute/lucene/read/ValuesSourceReaderOperator.java

            );
        }
+        if (fields[field].info.nullsFiltered() && block.mayHaveNulls()) {
+            assert IntStream.range(0, block.getPositionCount()).noneMatch(block::isNull)


This can be expensive, so it is only enabled with assertions.

dnhatn · 2025-08-21T06:13:33Z

...esql/compute/src/main/java/org/elasticsearch/compute/lucene/read/ValuesFromSingleReader.java

            }
            for (ColumnAtATimeWork r : columnAtATimeReaders) {
-                target[r.idx] = (Block) r.reader.read(loaderBlockFactory, docs, offset);
+                target[r.idx] = (Block) r.reader.read(loaderBlockFactory, docs, offset, operator.fields[r.idx].info.nullsFiltered());


We pass the nullsFiltered from FieldInfo to column reader.

elasticsearchmachine · 2025-08-21T06:22:36Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2025-08-21T06:22:36Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

dnhatn · 2025-08-21T06:23:52Z

kkrik-es · 2025-08-21T09:07:33Z

...in/esql/src/main/java/org/elasticsearch/xpack/esql/planner/EsPhysicalOperationProviders.java

+            case TermsQueryBuilder q -> Set.of(q.fieldName());
+            case RangeQueryBuilder q -> Set.of(q.fieldName());
+            case ConstantScoreQueryBuilder q -> nullsFilteredFieldsAfterSourceQuery(q.innerQuery());
+            // TODO: support SingleValueQuery


Should we also find and ignore coalence? Or is this part of default?

kkrik-es

Looks good!

One note: what happens if we have multiple field metrics? For instance:

TS metrics 
| WHERE cpu_usage IS NOT NULL OR memory_usage IS NOT NULL
| STATS max(avg_over_time(cpu_usage)), max(avg_over_time(memory_usage)) BY tbucket(1 hour)

I'd think we don't necessarily get dense values here? If so, is this covered?

dnhatn · 2025-08-21T18:55:59Z

TS metrics 
| WHERE cpu_usage IS NOT NULL OR memory_usage IS NOT NULL
| STATS max(avg_over_time(cpu_usage)), max(avg_over_time(memory_usage)) BY tbucket(1 hour)

We won't be able to infer nullsFiltered from that query for either cpu_usage or memory_usage. Yes, we have tests for this case.

dnhatn · 2025-08-21T21:36:32Z

Thanks Kostas!

elasticsearchmachine added the v9.2.0 label Aug 21, 2025

dnhatn force-pushed the reading-dense-values branch from ed87ed5 to 882b679 Compare August 21, 2025 05:56

dnhatn added >non-issue :Analytics/ES|QL AKA ESQL :StorageEngine/TSDB You know, for Metrics labels Aug 21, 2025

dnhatn requested review from kkrik-es, martijnvg and nik9000 August 21, 2025 06:08

Use query source to detect non-null docs

1a5aecb

dnhatn force-pushed the reading-dense-values branch from 882b679 to 1a5aecb Compare August 21, 2025 06:11

dnhatn commented Aug 21, 2025

View reviewed changes

dnhatn marked this pull request as ready for review August 21, 2025 06:22

elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine labels Aug 21, 2025

kkrik-es reviewed Aug 21, 2025

View reviewed changes

kkrik-es approved these changes Aug 21, 2025

View reviewed changes

dnhatn added 2 commits August 21, 2025 08:16

Add tests

918e8aa

remove assertions (not hold with union types)

1e4a7b8

dnhatn merged commit 971c87d into elastic:main Aug 21, 2025
35 checks passed

dnhatn deleted the reading-dense-values branch August 21, 2025 21:37

Use query source to detect non-null docs #133287

Use query source to detect non-null docs #133287

Uh oh!

Conversation

dnhatn commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

dnhatn Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

kkrik-es Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

dnhatn Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

dnhatn Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

dnhatn Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Aug 21, 2025

Uh oh!

elasticsearchmachine commented Aug 21, 2025

Uh oh!

dnhatn commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kkrik-es Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

kkrik-es left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn commented Aug 21, 2025

Uh oh!

dnhatn commented Aug 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dnhatn commented Aug 21, 2025 •

edited

Loading

dnhatn commented Aug 21, 2025 •

edited

Loading

kkrik-es left a comment •

edited

Loading