-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Benchmark date field range query with doc values sparse index #123251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark date field range query with doc values sparse index #123251
Conversation
BenchmarkingNote: use a nightly build of the Async Profiler. The stable release crashes. Capture CPU events./gradlew -p benchmarks run \
--args 'DateFieldMapperDocValuesSkipperBenchmark -prof "async:libPath=/Users/salvatore.campagna/async-profiler-3.0-f71c31a-macos/lib/libasyncProfiler.dylib;dir=/Users/salvatore.campagna/workspace/elasticsearch/flamegraph;event=cpu;output=flamegraph"' |
| indexWriter.addDocument(doc); | ||
| } | ||
|
|
||
| indexWriter.commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change this in such a way to commit every n documents so to make sure we run the query on multiple segments per index. This should make the benchmark a bit more realistic.
|
Previously, the index was a single segment, making it hard to see CPU differences between skipper and non-skipper. By adding |
| rangeEndTimestamp, | ||
| rangeQuery | ||
| ); | ||
| return searcher.search(query, nDocs, QUERY_SORT).totalHits.value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will use count here instead of search to try to rule-out the work we do when collecting documents. The expectation is that, no matter if we use doc values sparse indices or not, in the fetch phase we need to fetch the same set of documents which are laid-out on disk in the same way in both scenarios. Hopefully using count will allow us to isolate the search phase only in the flame graph.
| new Runner(options).run(); | ||
| } | ||
|
|
||
| @Param("1343120") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use a large number of documents here...anyway we can't really prevent a scenario where everything ends up being in memory...unless we generate a very large set of documents, which is probably not ideal for a JMH benchmark.
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
martijnvg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
This JMH benchmark measures performance for range queries on the
@timestampfield under two indexing strategies:host.nameand@timestamp, andhost.namefield and a KDB tree for the@timestampfield.It mirrors LogsDB queries from our Rally nightly benchmarks, where latency regressions appeared after introducing sparse doc value indices. By isolating this scenario, we can identify regression causes, capture detailed profiling (e.g., flame graphs), and guide optimizations for range queries in LogsDB.
The parameter values may look arbitrary, but they’re chosen deliberately to avoid “alignment” side effects. Varying
batchSizemodels different log batch sizes per host, whilecommitEveryensures Lucene segments flush at intervals that don’t neatly align with batch boundaries. This prevents artificially favorable (or unfavorable) conditions from perfect overlaps. Finally,queryRangecovers narrow, medium, and wide time spans to capture different levels of selectivity. Together, these choices recreate a range of realistic scenarios while avoiding misleading alignments between data batches and segment boundaries.NOTE: the Rally queries in nightly benchmarks also require the result to be sorted on the
@timestampfield (instead of using the data as it is sorted in the index). Anyway, we usecountwhen running the query instead of a fullsearchoperation returning all data, because in both benchmarking scenarios we expect exactly the same set of documents to be fetched (with the same fetch pattern). Again, we would like to focus on "what is different here" trying to rule-out what is expected to stay the same. This way we can focus more our investigation on the actual work done during the search phase rather then in the work we do to "re-sort" documents on@timestampand fetching them from disk.