[RFC] Comprehensive Search Latency Breakdown

### Is your feature request related to a problem? Please describe

Related to: [RFC] Search Performance and Resiliency #8879, and [META] Query Insights Framework #11522

### Current State

Query Insights (#11522) successfully identifies which queries are slow through:

* Top N queries by latency (#11904) ✅
* Query-level CPU and memory tracking (#13172) ✅
* Export capabilities for external monitoring (#12982) ✅

**What's tracked today:** Query phase, Fetch phase, Can_match phase durations
**The Problem:** Operators at times observe that total search latency significantly exceeds the sum of reported phases.

**Production Example:**

Total search latency: 150ms
├─ query_phase: 40ms (27%)
├─ fetch_phase: 15ms (10%)
├─ can_match_phase: 5ms (3%)
└─ UNACCOUNTED: 90ms (60%)

### The Pain Point

In the above mentioned production example 60% of latency is invisible. Operators cannot answer:

* Why is my query taking 150ms when phases only account for 60ms?
* Is it queue saturation? Network delays? Coordination overhead?
* Where should I optimize first?

**Current workflow (without breakdown):**

1. See 150ms query in Query Insights Top N
2. Collect and analyze resource utilization metrics, thread dumps, or flame graphs, or JFR profiles (hours of work, needs deep expertise)
3. Experiment with config/sharding/setup/query changes (trial and error)
4. Hope the change helps

**Impact:** Performance optimization becomes guesswork without clarity which request workflow segment/component is adding most to the search request latency. MTTR for latency issues measured in days/weeks and at times requiring tenured search developers to identify and address latency issues.


### Describe the solution you'd like

### Proposed Changes

Add comprehensive latency breakdown to Query Insights showing exactly where time is spent:

**What We'll Track**

**Phases (enhanced existing):** Query, Fetch, Can_match, DFS, Suggest, Rewrite, Expand

**Aggregations (new):** Aggregation execution (per-shard), Aggregation reduce (coordinator), Aggregation type breakdown (optional)

**Coordination (new):** Request parsing, shard routing, response aggregation, reduce phase, result processing

**Transport (new):** Network round-trip time, request serialization/deserialization, response serialization/deserialization

**Queuing (new):** Search queue wait, get/fetch queue wait, thread scheduling

**Cache (new):** Query cache lookup/write, request cache lookup/write, shard request cache, fielddata operations

**Concurrency (new):** Lock acquisition, shard lock contention, circuit breaker checks

**Concurrent Segment Search (new, if enabled):** Slice creation and distribution, Slice execution scheduling, Per-slice execution time (min, max, avg, variance), Slice result aggregation, Thread pool coordination overhead

**Cluster State & Metadata (new):** Cluster state lookup, index metadata access, mapping lookup

**Plugins & Extensions (new):** Security plugin overhead, custom plugin processing, extension processing


### Prioritization for Phased Implementation

When implementing in phases, the priority order can be as follows:

* Phase 1 (Must-Have): Coordination , Queuing, Concurrent Segment Search
* Phase 2 (High Value): Cache operations, Concurrency , Cluster state & metadata
* Phase 3 (Nice-to-Have): Plugins & extensions, Transport , Phases (enhanced tracking)

### Query Insights API - Before and After Examples 

**API Request**

GET /_insights/top_queries?type=latency&size=10&from=2026-02-12T00:00:00Z&to=2026-02-12T23:59:59Z


**Example 1: Query WITHOUT Aggregations**

**BEFORE:** Current Query Insights Response

What you see today (limited visibility):

```
{
  "top_queries": [
    {
      "timestamp": 1707764400000,
      "node_id": "node-1",
      "indices": ["content"],
      "search_type": "query_then_fetch",
      "total_shards": 220,
      "source": {
        "query": {
          "bool": {
            "must": [
              {"match": {"content": "search term"}},
              {"exists": {"field": "category"}}
            ]
          }
        },
        "size": 100,
        "from": 0
      },
      "took": 150,
      "phase_latency_breakdown": {
        "query": 40,
        "fetch": 15,
        "can_match": 5
      },
      "resource_usages": {
        "cpu": 45,
        "memory": 1024000
      },
      "user": "admin",
      "labels": {
        "team": "analytics",
        "priority": "high"
      }
    }
  ]
}
```
**Problem:** You see 150ms total latency with only 60ms accounted for in phases (query: 40ms, fetch: 15ms, can_match: 5ms). 90ms (60%) is unaccounted for. You know the query is slow, but not why.
The Gap: Where is the missing 90ms?

* Queue wait times❓
* Network delays❓
* Coordination overhead❓
* Cache operations❓
* Concurrent segment search overhead❓

**AFTER:** Enhanced Query Insights Response

What you'll see (improved visibility):

```
{
  "top_queries": [
    {
      "timestamp": 1707764400000,
      "node_id": "node-1",
      "indices": ["content"],
      "search_type": "query_then_fetch",
      "total_shards": 220,
      "source": {
        "query": {
          "bool": {
            "must": [
              {"match": {"content": "search term"}},
              {"exists": {"field": "category"}}
            ]
          }
        },
        "size": 100,
        "from": 0
      },
      "took": 150,
      "resource_usages": {
        "cpu": 45,
        "memory": 1024000
      },
      "user": "admin",
      "labels": {
        "team": "analytics",
        "priority": "high"
      },
      "latency_breakdown": {
        "phases": {
          "query": 40,
          "fetch": 15,
          "can_match": 5,
          "dfs": 0,
          "suggest": 0,
          "rewrite": 2,
          "expand": 1
        },
        "aggregations": {
          "enabled": false,
          "aggregation_execution": 0,
          "aggregation_reduce": 0
        },
        "coordination": {
          "request_parsing": 2,
          "shard_routing": 3,
          "response_aggregation": 6,
          "reduce_phase": 4,
          "result_processing": 3
        },
        "transport": {
          "network_roundtrip": 25,
          "request_serialization": 2,
          "request_deserialization": 1,
          "response_serialization": 2,
          "response_deserialization": 2
        },
        "queuing": {
          "search_queue_wait": 35,
          "get_fetch_queue_wait": 8,
          "thread_scheduling": 2
        },
        "cache": {
          "query_cache_lookup": 1,
          "query_cache_write": 0,
          "request_cache_lookup": 2,
          "request_cache_write": 0,
          "shard_request_cache": 1,
          "fielddata_operations": 1
        },
        "concurrency": {
          "lock_acquisition": 1,
          "shard_lock_contention": 0,
          "circuit_breaker_checks": 0
        },
        "concurrent_segment_search": {
          "enabled": true,
          "slice_creation_and_distribution": 2,
          "slice_execution_scheduling": 3,
          "per_slice_execution": {
            "slice_count": 2,
            "min_execution_time": 35,
            "max_execution_time": 42,
            "avg_execution_time": 38.5,
            "variance": 7,
            "individual_slices": [
              {
                "slice_id": 0,
                "execution_time": 35,
                "segments_processed": 8
              },
              {
                "slice_id": 1,
                "execution_time": 42,
                "segments_processed": 10
              }
            ]
          },
          "slice_result_aggregation": 5,
          "thread_pool_coordination": 2
        },
        "cluster_state_metadata": {
          "cluster_state_lookup": 1,
          "index_metadata_access": 1,
          "mapping_lookup": 1
        },
        "plugins_extensions": {
          "security_plugin_overhead": 2,
          "custom_plugin_processing": 0,
          "extension_processing": 0
        },
        "other": 1
      }
    }
  ]
}
```
**Actionable Insights:**

1. Queue Saturation (35ms wait, 23% of total) → Increase search thread pool or add data nodes or query optimization
2. Network Overhead (25ms roundtrip, 17% of total) → Investigate network topology or capacity saturation
3. Concurrent Segment Search (slice count = 2 validated) → Low aggregation overhead (5ms) confirms optimization
4. No Aggregations → Confirms aggregations not contributing to latency


**Example 2: Query WITH Aggregations**

**BEFORE:** Current Query Insights Response

What you see today (limited visibility):

```
{
  "top_queries": [
    {
      "timestamp": 1707764500000,
      "node_id": "node-2",
      "indices": ["logs-2026.02.*"],
      "search_type": "query_then_fetch",
      "total_shards": 120,
      "source": {
        "query": {
          "bool": {
            "filter": [
              {"range": {"@timestamp": {"gte": "2026-02-12T00:00:00", "lte": "2026-02-12T23:59:59"}}},
              {"term": {"status": "error"}}
            ]
          }
        },
        "aggs": {
          "error_types": {
            "terms": {
              "field": "error_type.keyword",
              "size": 100
            },
            "aggs": {
              "hourly_distribution": {
                "date_histogram": {
                  "field": "@timestamp",
                  "fixed_interval": "1h"
                },
                "aggs": {
                  "avg_response_time": {
                    "avg": {"field": "response_time"}
                  }
                }
              }
            }
          }
        },
        "size": 0
      },
      "took": 320,
      "phase_latency_breakdown": {
        "query": 120,
        "fetch": 25,
        "can_match": 8
      },
      "resource_usages": {
        "cpu": 180,
        "memory": 2048000
      },
      "user": "analytics_team",
      "labels": {
        "team": "observability",
        "priority": "medium"
      }
    }
  ]
}
```
**Problem:** You see 320ms total latency with only 153ms accounted for in phases (query: 120ms, fetch: 25ms, can_match: 8ms). 167ms (52%) is unaccounted for. The query phase shows 120ms, but you don't know how much of that is aggregation execution vs. query/filter execution.
The Gap: Where is the missing 167ms? And within the 120ms query phase, how much is aggregation overhead?

**AFTER:** Enhanced Query Insights Response

What you'll see (complete visibility with aggregation breakdown):

```
{
  "top_queries": [
    {
      "timestamp": 1707764500000,
      "node_id": "node-2",
      "indices": ["logs-2026.02.*"],
      "search_type": "query_then_fetch",
      "total_shards": 120,
      "source": {
        "query": {
          "bool": {
            "filter": [
              {"range": {"@timestamp": {"gte": "2026-02-12T00:00:00", "lte": "2026-02-12T23:59:59"}}},
              {"term": {"status": "error"}}
            ]
          }
        },
        "aggs": {
          "error_types": {
            "terms": {
              "field": "error_type.keyword",
              "size": 100
            },
            "aggs": {
              "hourly_distribution": {
                "date_histogram": {
                  "field": "@timestamp",
                  "fixed_interval": "1h"
                },
                "aggs": {
                  "avg_response_time": {
                    "avg": {"field": "response_time"}
                  }
                }
              }
            }
          }
        },
        "size": 0
      },
      "took": 320,
      "resource_usages": {
        "cpu": 180,
        "memory": 2048000
      },
      "user": "analytics_team",
      "labels": {
        "team": "observability",
        "priority": "medium"
      },
      "latency_breakdown": {
        "phases": {
          "query": 25,
          "fetch": 25,
          "can_match": 8,
          "dfs": 0,
          "suggest": 0,
          "rewrite": 3,
          "expand": 0
        },
        "aggregations": {
          "enabled": true,
          "aggregation_execution": 180,
          "aggregation_reduce": 35,
          "type_breakdown": {
            "terms": 120,
            "date_histogram": 45,
            "metrics": 15
          }
        },
        "coordination": {
          "request_parsing": 3,
          "shard_routing": 4,
          "response_aggregation": 8,
          "reduce_phase": 35,
          "result_processing": 5
        },
        "transport": {
          "network_roundtrip": 28,
          "request_serialization": 3,
          "request_deserialization": 2,
          "response_serialization": 4,
          "response_deserialization": 3
        },
        "queuing": {
          "search_queue_wait": 15,
          "get_fetch_queue_wait": 5,
          "thread_scheduling": 2
        },
        "cache": {
          "query_cache_lookup": 2,
          "query_cache_write": 0,
          "request_cache_lookup": 0,
          "request_cache_write": 0,
          "shard_request_cache": 1,
          "fielddata_operations": 8
        },
        "concurrency": {
          "lock_acquisition": 1,
          "shard_lock_contention": 0,
          "circuit_breaker_checks": 1
        },
        "concurrent_segment_search": {
          "enabled": true,
          "slice_creation_and_distribution": 2,
          "slice_execution_scheduling": 3,
          "per_slice_execution": {
            "slice_count": 2,
            "min_execution_time": 22,
            "max_execution_time": 28,
            "avg_execution_time": 25,
            "variance": 6,
            "individual_slices": [
              {
                "slice_id": 0,
                "execution_time": 22,
                "segments_processed": 12
              },
              {
                "slice_id": 1,
                "execution_time": 28,
                "segments_processed": 15
              }
            ]
          },
          "slice_result_aggregation": 4,
          "thread_pool_coordination": 2
        },
        "cluster_state_metadata": {
          "cluster_state_lookup": 1,
          "index_metadata_access": 2,
          "mapping_lookup": 2
        },
        "plugins_extensions": {
          "security_plugin_overhead": 3,
          "custom_plugin_processing": 0,
          "extension_processing": 0
        },
        "other": 2
      }
    }
  ]
}
```

**Actionable Insights:**

1. Aggregations Dominate (215ms, 67% of total took time) → Primary optimization target
2. Terms Aggregation (120ms, 38% of total) → Action: Reduce cardinality (evaluate composite aggregation, add filters, or reduce size from 100)
3. Date Histogram (45ms, 14% of total) → Action: Consider larger fixed_interval (e.g., 4h instead of 1h)
4. Aggregation Reduce (35ms, 11% of total) → Action: Reduce shard count or optimize aggregation structure
5. Fielddata Operations (8ms) → Status: Acceptable for terms aggregation on keyword field

**Before vs. After:**

* Before: "Query is slow at 320ms, query phase shows 120ms but unclear why"
* After: "Aggregations account for 215ms (67%); terms aggregation is the bottleneck at 120ms"
* Time to Resolution: Minutes instead of hours with targeted aggregation optimization


### Configuration to Enable Breakdown

**Configuration**

```
# Enable breakdown tracking
plugins.query_insights.latency_breakdown.enabled: true

# Detail level: basic (8 timers) | detailed (16 timers) | debug (24 timers)
plugins.query_insights.latency_breakdown.level: detailed

# Only track queries exceeding threshold
plugins.query_insights.latency_breakdown.threshold: 100  # ms
```

**Dynamic Update:**

```
PUT /_cluster/settings{
  "persistent": {
    "plugins.query_insights.latency_breakdown.enabled": true,
    "plugins.query_insights.latency_breakdown.level": "detailed",
    "plugins.query_insights.latency_breakdown.threshold": 100
  }
}
```




### Related component

Search:Performance

### Describe alternatives you've considered

_No response_

### Additional context

### Impact and Benefits

This framework transforms root cause analysis from a days-or-weeks long process of guesswork and complex diagnostics (flame graphs, thread dumps) into a data-driven operation completed in minutes-or-hours. By providing precise visibility, it enables teams to meet stringent (e.g. p99 SLA) performance requirements and optimize costs by identifying configuration-based fixes—like cache tuning and query optimization —over expensive cluster scaling. It directly implements the Visibility theme (#8879) 

**Integration and Future Work**

Built on the existing Query Insights (#11522) and Top N (#11904) infrastructure, this backward-compatible update complements current resource tracking and exporters without API breaking changes. It serves as the foundation for automated recommendations (#12292)—such as thread pool adjustments—and upcoming Dashboard visualizations, ensuring that deep latency breakdowns are accessible to both exporters and end-users.


### Proposed Next Steps

1. Community Feedback: Validate approach and components
2. Detailed Design: HLD with instrumentation strategy
3. POC: Implement subset to validate overhead
4. Phased Implementation: <TBD>



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Comprehensive Search Latency Breakdown #20693

Is your feature request related to a problem? Please describe

Current State

The Pain Point

Describe the solution you'd like

Proposed Changes

Prioritization for Phased Implementation

Query Insights API - Before and After Examples

Configuration to Enable Breakdown

Related component

Describe alternatives you've considered

Additional context

Impact and Benefits

Proposed Next Steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Comprehensive Search Latency Breakdown #20693

Description

Is your feature request related to a problem? Please describe

Current State

The Pain Point

Describe the solution you'd like

Proposed Changes

Prioritization for Phased Implementation

Query Insights API - Before and After Examples

Configuration to Enable Breakdown

Related component

Describe alternatives you've considered

Additional context

Impact and Benefits

Proposed Next Steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions