Speed up loading keyword fields with index sorts #132950

dnhatn · 2025-08-14T19:28:59Z

Reading keyword fields that are the primary sort in the index can be sped up by skip reading, as identical values are stored together. In this case, we can use values from the doc_values skipper instead of loading values from doc_values. However, the doc_values skipper is not enabled yet. Here, we use two buffers when reading ordinals: one for the beginning of the block and one for the end. If both return the same value, we can skip the middle. There is a follow-up step where we fill the values in the middle until we reach the last value. This optimization should speed up time-series queries.

martijnvg

Thanks this looks great - LGTM.

martijnvg · 2025-08-15T02:40:25Z

...er/src/test/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesFormatTests.java

            try (var reader = DirectoryReader.open(iw)) {
                int gaugeIndex = numDocs;
                for (var leaf : reader.leaves()) {
-                    var timestampDV = getBulkNumericDocValues(leaf.reader(), timestampField);


Like we talked about we need to adjust these test to also use sorted(set) doc values.

martijnvg · 2025-08-15T02:42:49Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java

                readFields(in, state.fieldInfos);
-
+                final var indexSort = state.segmentInfo.getIndexSort();
+                if (indexSort != null && indexSort.getSort().length > 0) {


👍 - good idea to extract this information from segment info, no need to check for index setting.

martijnvg · 2025-08-15T02:48:37Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java

+                        if (lookaheadBlockIndex + 1 != blockIndex) {
+                            lookaheadData.seek(indexReader.get(blockIndex));
+                        }
+                        if (maxOrd >= 0) {


Currently this is only invoked from getSorted(...) and in that case maxOrd is always >= 0.
I think we should keep this branch, otherwise we can't reuse lookAheadValueAt(...) for numeric doc value.

Maybe replace this with a switch statement:

switch (maxOrdAsInt) { case -1: decoder.decode(lookaheadData, lookaheadBlock); break; default: decoder.decodeOrdinals(lookaheadData, lookaheadBlock, bitsPerOrd); }

I think this is a little bit more efficient?

I tried but Intellij suggested this 91d32d1

👍 - I suspect that is also easier to be optimized at runtime by jvm.

martijnvg · 2025-08-15T02:49:40Z

server/src/main/java/org/elasticsearch/index/mapper/BlockLoader.java

+     * to load the requested values, for example due to unsupported underlying data.
+     * This allows callers to optimistically try optimized loading strategies first, and fall back if necessary.
+     */
+    interface OptionalColumnAtATimeReader {


👍 This is better than previous abstraction (BulkNumericDocValues).

martijnvg · 2025-08-15T02:51:38Z

...ompute/src/main/java/org/elasticsearch/compute/lucene/read/DelegatingBlockLoaderFactory.java

    }

+    @Override
+    public BytesRefBlock constantBytes(BytesRef value, int count) {


Could this also be used in SingletonOrdinalsBuilder#tryBuildConstantBlock(...)?

Great point. Unfortunately, we don't have DelegatingBlockLoaderFactory here. I will move this to BlockFactory in a follow-up.

martijnvg · 2025-08-15T06:02:39Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java

 final class ES819TSDBDocValuesProducer extends DocValuesProducer {
    final IntObjectHashMap<NumericEntry> numerics;
+    private int primarySortFieldNumber = -1;
+    private boolean primarySortFieldReversed = false;


Note that this field can be converted to a variable.

thanks, I removed a0ce374

martijnvg · 2025-08-15T06:08:38Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java

+
+                    @Override
+                    long lookAheadValueAt(int targetDoc) throws IOException {
+                        return 0L;  // Only one ordinal!


easy one :)

the test found it :)

elasticsearchmachine · 2025-08-15T06:20:43Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2025-08-15T06:20:43Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

elasticsearchmachine · 2025-08-15T06:27:01Z

Hi @dnhatn, I've created a changelog YAML for you.

* upstream/main: (32 commits) Speed up loading keyword fields with index sorts (elastic#132950) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testSyntheticSourceWithTranslogSnapshot elastic#132964 Simplify EsqlSession (elastic#132848) Implement WriteLoadConstraintDecider#canAllocate (elastic#132041) Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/400_synthetic_source/_doc_count} elastic#132965 Switch to PR-based benchmark pipeline defined in ES repo (elastic#132941) Breakdown undesired allocations by shard routing role (elastic#132235) Implement v_magnitude function (elastic#132765) Introduce execution location marker for better handling of remote/local compatibility (elastic#132205) Mute org.elasticsearch.cluster.ClusterInfoServiceIT testMaxQueueLatenciesInClusterInfo elastic#132957 Unmuting simulate index data stream mapping overrides yaml rest test (elastic#132946) Remove CrossClusterCancellationIT.createLocalIndex() (elastic#132952) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetch elastic#132956 Fix failing UT by adding a required capability (elastic#132947) Precompute the BitsetCacheKey hashCode (elastic#132875) Adding simulate ingest effective mapping (elastic#132833) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetchMany elastic#132948 Rename skipping logic to remove hard link to skip_unavailable (elastic#132861) Store ignored source in unique stored fields per entry (elastic#132142) Add random tests with match_only_text multi-field (elastic#132380) ...

Reading keyword fields that are the primary sort in the index can be sped up by skip reading, as identical values are stored together. In this case, we can use values from the doc_values skipper instead of loading values from doc_values. However, the doc_values skipper is not enabled yet. Here, we use two buffers when reading ordinals: one for the beginning of the block and one for the end. If both return the same value, we can skip the middle. There is a follow-up step where we fill the values in the middle until we reach the last value. This optimization should speed up time-series queries.

…-stats * upstream/main: (36 commits) Fix reproducability of builds against Java EA versions (elastic#132847) Speed up loading keyword fields with index sorts (elastic#132950) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testSyntheticSourceWithTranslogSnapshot elastic#132964 Simplify EsqlSession (elastic#132848) Implement WriteLoadConstraintDecider#canAllocate (elastic#132041) Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/400_synthetic_source/_doc_count} elastic#132965 Switch to PR-based benchmark pipeline defined in ES repo (elastic#132941) Breakdown undesired allocations by shard routing role (elastic#132235) Implement v_magnitude function (elastic#132765) Introduce execution location marker for better handling of remote/local compatibility (elastic#132205) Mute org.elasticsearch.cluster.ClusterInfoServiceIT testMaxQueueLatenciesInClusterInfo elastic#132957 Unmuting simulate index data stream mapping overrides yaml rest test (elastic#132946) Remove CrossClusterCancellationIT.createLocalIndex() (elastic#132952) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetch elastic#132956 Fix failing UT by adding a required capability (elastic#132947) Precompute the BitsetCacheKey hashCode (elastic#132875) Adding simulate ingest effective mapping (elastic#132833) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetchMany elastic#132948 Rename skipping logic to remove hard link to skip_unavailable (elastic#132861) Store ignored source in unique stored fields per entry (elastic#132142) ...

elasticsearchmachine added the v9.2.0 label Aug 14, 2025

dnhatn force-pushed the values-reader-with-skipper branch from ce26e27 to b2ae4bf Compare August 14, 2025 21:50

dnhatn changed the title ~~Load constant blocks for keyword fields with index sorts~~ Speed up loading keyword fields with index sorts Aug 14, 2025

dnhatn force-pushed the values-reader-with-skipper branch from b2ae4bf to 503b524 Compare August 14, 2025 22:02

Speed up loading keyword fields with index sorts

8639922

dnhatn force-pushed the values-reader-with-skipper branch from 3081cfa to 8639922 Compare August 14, 2025 22:24

dnhatn requested a review from martijnvg August 15, 2025 00:13

Speed up loading keyword fields with index sorts

e24a8e0

dnhatn force-pushed the values-reader-with-skipper branch from 165df8b to e24a8e0 Compare August 15, 2025 00:44

martijnvg approved these changes Aug 15, 2025

View reviewed changes

martijnvg reviewed Aug 15, 2025

View reviewed changes

dnhatn added 2 commits August 14, 2025 23:02

Add tests

ecb306d

remove var

a0ce374

martijnvg reviewed Aug 15, 2025

View reviewed changes

dnhatn added :Analytics/ES|QL AKA ESQL :StorageEngine/TSDB You know, for Metrics labels Aug 15, 2025

if-else

91d32d1

dnhatn marked this pull request as ready for review August 15, 2025 06:20

elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine labels Aug 15, 2025

dnhatn added the >enhancement label Aug 15, 2025

Update docs/changelog/132950.yaml

0c4edb6

dnhatn and others added 2 commits August 14, 2025 23:33

fix changelog

4207b63

Merge remote-tracking branch 'es/main' into values-reader-with-skipper

4c05703

martijnvg merged commit 64f8209 into elastic:main Aug 15, 2025
34 checks passed

dnhatn deleted the values-reader-with-skipper branch September 9, 2025 16:14

Speed up loading keyword fields with index sorts #132950

Speed up loading keyword fields with index sorts #132950

Uh oh!

Conversation

dnhatn commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Aug 15, 2025

Uh oh!

elasticsearchmachine commented Aug 15, 2025

Uh oh!

elasticsearchmachine commented Aug 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dnhatn commented Aug 14, 2025 •

edited

Loading