DFS search phase per shard duration APM metric #135652

chrisparrinello · 2025-09-29T20:02:12Z

For https://elasticco.atlassian.net/browse/ES-12391, splitting DFS metrics from #135285 per @javanna 's suggestion.

…hards.phases.query.duration.histogram

elasticsearchmachine · 2025-09-29T20:03:03Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

elasticsearchmachine · 2025-09-29T20:03:58Z

Hi @chrisparrinello, I've created a changelog YAML for you.

chrisparrinello · 2025-09-29T21:09:45Z

@javanna I implemented your suggestion to not pull out any of the search execution attributes for the DFS phase metrics as we talked about on #135285

javanna · 2025-09-30T13:46:19Z

...src/test/java/org/elasticsearch/search/TelemetryMetrics/ShardSearchPhaseAPMMetricsTests.java

            "1"
        );
+        final List<Measurement> dfsMeasurements = getTestTelemetryPlugin().getLongHistogramMeasurement(DFS_SEARCH_PHASE_METRIC);
+        assertEquals(num_primaries, dfsMeasurements.size());


can you check that the measurement make some sense? For instance, are they always greater than 0? Are they always lower than the total took time?

Unfortunately, they're not always greater than zero because we convert nanoseconds to milliseconds before storing them in the histogram so if something took less than a millisecond, we record a zero. This definitely happens in the unit tests. I took a stab at checking against took time but that I means I need to pull apart all of the asserts to get the SearchResponse, for example:

public void testMetricsDfsQueryThenFetch() { SearchRequestBuilder requestBuilder = client().prepareSearch(indexName) .setSearchType(SearchType.DFS_QUERY_THEN_FETCH) .setQuery(simpleQueryStringQuery("doc1")); SearchResponse searchResponse = requestBuilder.get(); try { assertNoFailures(searchResponse); assertHitCount(searchResponse, 1); assertSearchHits(searchResponse, "1"); final List<Measurement> dfsMeasurements = getTestTelemetryPlugin().getLongHistogramMeasurement(DFS_SEARCH_PHASE_METRIC); assertMeasurements(dfsMeasurements, num_primaries, searchResponse.getTook().millis()); final List<Measurement> queryMeasurements = getTestTelemetryPlugin().getLongHistogramMeasurement(QUERY_SEARCH_PHASE_METRIC); assertEquals(num_primaries, queryMeasurements.size()); final List<Measurement> fetchMeasurements = getTestTelemetryPlugin().getLongHistogramMeasurement(FETCH_SEARCH_PHASE_METRIC); assertEquals(1, fetchMeasurements.size()); assertAttributes(fetchMeasurements, false, false); } finally { searchResponse.decRef(); } }

where assertMeasurements checks to make sure the measurements are less than or equal to the took time from the response and we have the right number of measurements. Let me know if you want to take this approach and I'll modify all of the tests to make sure we're asserting valid measurements.

About the nanoseconds getting converted to 0 milliseconds, one thought I had was to change the units from milliseconds to microseconds or nanoseconds but the issue is that the underlying OpenTelemetry implementation of the histogram buckets the measurements before reporting to the APM server and there is an upper bound to the buckets (something like 110k) so if you choose the wrong scale you lose precision for measurements greater than 110k. There is a way to control the bucketing but it is deep deep in the OpenTelemetry meter code.

Right, sorry, took time is at the coord level, it's not possible to get it here. I see! and thanks for the explanation about the rounding. And for checking further about precision. I think we are good here!

javanna

LGTM

Adds a per shard duration for the DFS search phase called es.search.s…

5fce894

…hards.phases.query.duration.histogram

chrisparrinello requested a review from javanna September 29, 2025 20:02

elasticsearchmachine added needs:triage Requires assignment of a team area label v9.2.0 labels Sep 29, 2025

chrisparrinello added >enhancement :Search Foundations/Search Catch all for Search Foundations and removed needs:triage Requires assignment of a team area label v9.2.0 labels Sep 29, 2025

elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Sep 29, 2025

chrisparrinello added the v9.2.0 label Sep 29, 2025

Update docs/changelog/135652.yaml

e457482

chrisparrinello requested a review from smalyshev September 29, 2025 20:04

chrisparrinello and others added 3 commits September 29, 2025 15:05

Merge branch 'main' into dfs_shard_metrics

b7a53b0

do not pull out search execution context for attributes

4241695

Merge branch 'main' into dfs_shard_metrics

8af8182

javanna reviewed Sep 30, 2025

View reviewed changes

javanna approved these changes Sep 30, 2025

View reviewed changes

chrisparrinello merged commit cb2907a into elastic:main Sep 30, 2025
33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DFS search phase per shard duration APM metric #135652

DFS search phase per shard duration APM metric #135652

Uh oh!

chrisparrinello commented Sep 29, 2025

Uh oh!

elasticsearchmachine commented Sep 29, 2025

Uh oh!

elasticsearchmachine commented Sep 29, 2025

Uh oh!

chrisparrinello commented Sep 29, 2025

Uh oh!

javanna Sep 30, 2025

Uh oh!

chrisparrinello Sep 30, 2025 •

edited

Loading

Uh oh!

javanna Sep 30, 2025

Uh oh!

javanna left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DFS search phase per shard duration APM metric #135652

DFS search phase per shard duration APM metric #135652

Uh oh!

Conversation

chrisparrinello commented Sep 29, 2025

Uh oh!

elasticsearchmachine commented Sep 29, 2025

Uh oh!

elasticsearchmachine commented Sep 29, 2025

Uh oh!

chrisparrinello commented Sep 29, 2025

Uh oh!

javanna Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

chrisparrinello Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

javanna Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

javanna left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chrisparrinello Sep 30, 2025 •

edited

Loading