Benchmark date field range query with doc values sparse index #123251

salvatore-campagna · 2025-02-24T10:00:29Z

This JMH benchmark measures performance for range queries on the @timestamp field under two indexing strategies:

with a sparse doc values index (doc values skipper) on the host.name and @timestamp, and
without a sparse doc values index, using standard doc values, an inverted index on the host.name field and a KDB tree for the @timestamp field.

It mirrors LogsDB queries from our Rally nightly benchmarks, where latency regressions appeared after introducing sparse doc value indices. By isolating this scenario, we can identify regression causes, capture detailed profiling (e.g., flame graphs), and guide optimizations for range queries in LogsDB.

The parameter values may look arbitrary, but they’re chosen deliberately to avoid “alignment” side effects. Varying batchSize models different log batch sizes per host, while commitEvery ensures Lucene segments flush at intervals that don’t neatly align with batch boundaries. This prevents artificially favorable (or unfavorable) conditions from perfect overlaps. Finally, queryRange covers narrow, medium, and wide time spans to capture different levels of selectivity. Together, these choices recreate a range of realistic scenarios while avoiding misleading alignments between data batches and segment boundaries.

NOTE: the Rally queries in nightly benchmarks also require the result to be sorted on the @timestamp field (instead of using the data as it is sorted in the index). Anyway, we use count when running the query instead of a full search operation returning all data, because in both benchmarking scenarios we expect exactly the same set of documents to be fetched (with the same fetch pattern). Again, we would like to focus on "what is different here" trying to rule-out what is expected to stay the same. This way we can focus more our investigation on the actual work done during the search phase rather then in the work we do to "re-sort" documents on @timestamp and fetching them from disk.

salvatore-campagna · 2025-02-24T10:28:16Z

Benchmarking

Note: use a nightly build of the Async Profiler. The stable release crashes.
Note: on MacOS you need to allow execution of libasyncProfiler.dylib in System Settings > Privacy & Security.

Capture CPU events

./gradlew -p benchmarks run \
--args 'DateFieldMapperDocValuesSkipperBenchmark -prof "async:libPath=/Users/salvatore.campagna/async-profiler-3.0-f71c31a-macos/lib/libasyncProfiler.dylib;dir=/Users/salvatore.campagna/workspace/elasticsearch/flamegraph;event=cpu;output=flamegraph"'

salvatore-campagna · 2025-02-24T10:49:41Z

...org/elasticsearch/benchmark/search/query/range/DateFieldMapperDocValuesSkipperBenchmark.java

+            indexWriter.addDocument(doc);
+        }
+
+        indexWriter.commit();


I will change this in such a way to commit every n documents so to make sure we run the query on multiple segments per index. This should make the benchmark a bit more realistic.

salvatore-campagna · 2025-02-24T11:39:19Z

Previously, the index was a single segment, making it hard to see CPU differences between skipper and non-skipper. By adding commitEvery, we force multiple segments, which better reflects real-world usage and highlights skipper’s impact. We also use a single-threaded search executor to isolate any doc-values skipping overhead from concurrency effects.

salvatore-campagna · 2025-02-24T15:20:55Z

...org/elasticsearch/benchmark/search/query/range/DateFieldMapperDocValuesSkipperBenchmark.java

+            rangeEndTimestamp,
+            rangeQuery
+        );
+        return searcher.search(query, nDocs, QUERY_SORT).totalHits.value();


I will use count here instead of search to try to rule-out the work we do when collecting documents. The expectation is that, no matter if we use doc values sparse indices or not, in the fetch phase we need to fetch the same set of documents which are laid-out on disk in the same way in both scenarios. Hopefully using count will allow us to isolate the search phase only in the flame graph.

salvatore-campagna · 2025-02-24T15:27:10Z

...org/elasticsearch/benchmark/search/query/range/DateFieldMapperDocValuesSkipperBenchmark.java

+        new Runner(options).run();
+    }
+
+    @Param("1343120")


We use a large number of documents here...anyway we can't really prevent a scenario where everything ends up being in memory...unless we generate a very large set of documents, which is probably not ideal for a JMH benchmark.

elasticsearchmachine · 2025-02-25T10:16:37Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

martijnvg

LGTM 👍

feature: benchmark date field with doc values sparse index

69113dc

salvatore-campagna self-assigned this Feb 24, 2025

salvatore-campagna added >non-issue :StorageEngine/Logs You know, for Logs labels Feb 24, 2025

elasticsearchmachine added the v9.1.0 label Feb 24, 2025

salvatore-campagna changed the title ~~feature: benchmark date field with doc values sparse index~~ Benchmark date field range query with doc values sparse index Feb 24, 2025

salvatore-campagna and others added 2 commits February 24, 2025 11:07

fix: use a separate jvm

a7e6d98

[CI] Auto commit changes from spotless

845e15d

salvatore-campagna commented Feb 24, 2025

View reviewed changes

salvatore-campagna and others added 4 commits February 24, 2025 13:31

Merge branch 'main' into feature/date-field-mapper-skipper-benchmark

e044093

fix: refactor for single thread multi segment execution

1c19ecf

[CI] Auto commit changes from spotless

9698e04

fix: prevent name too long for flamgraph file

49a2e48

salvatore-campagna commented Feb 24, 2025

View reviewed changes

salvatore-campagna requested a review from jordan-powers February 24, 2025 15:24

salvatore-campagna commented Feb 24, 2025

View reviewed changes

salvatore-campagna requested a review from martijnvg February 24, 2025 15:50

salvatore-campagna and others added 3 commits February 24, 2025 17:49

fix: use count api and remove inveted index

1a32600

docs: improve javadoc

cb5820f

Merge branch 'main' into feature/date-field-mapper-skipper-benchmark

07d621d

salvatore-campagna marked this pull request as ready for review February 25, 2025 10:16

Merge branch 'main' into feature/date-field-mapper-skipper-benchmark

9497aa6

elasticsearchmachine added the Team:StorageEngine label Feb 25, 2025

martijnvg approved these changes Feb 25, 2025

View reviewed changes

salvatore-campagna merged commit 86a6c93 into elastic:main Feb 26, 2025
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark date field range query with doc values sparse index #123251

Benchmark date field range query with doc values sparse index #123251

Uh oh!

salvatore-campagna commented Feb 24, 2025 •

edited

Loading

Uh oh!

salvatore-campagna commented Feb 24, 2025 •

edited

Loading

Uh oh!

salvatore-campagna Feb 24, 2025

Uh oh!

salvatore-campagna commented Feb 24, 2025 •

edited

Loading

Uh oh!

salvatore-campagna Feb 24, 2025 •

edited

Loading

Uh oh!

salvatore-campagna Feb 24, 2025

Uh oh!

elasticsearchmachine commented Feb 25, 2025

Uh oh!

martijnvg left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Benchmark date field range query with doc values sparse index #123251

Benchmark date field range query with doc values sparse index #123251

Uh oh!

Conversation

salvatore-campagna commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

salvatore-campagna commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarking

Capture CPU events

Uh oh!

salvatore-campagna Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

salvatore-campagna commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

salvatore-campagna Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

salvatore-campagna Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Feb 25, 2025

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

salvatore-campagna commented Feb 24, 2025 •

edited

Loading

salvatore-campagna commented Feb 24, 2025 •

edited

Loading

salvatore-campagna commented Feb 24, 2025 •

edited

Loading

salvatore-campagna Feb 24, 2025 •

edited

Loading