Skip to content

[RFC] Comprehensive Search Latency Breakdown #20693

@manojfaria

Description

@manojfaria

Is your feature request related to a problem? Please describe

Related to: [RFC] Search Performance and Resiliency #8879, and [META] Query Insights Framework #11522

Current State

Query Insights (#11522) successfully identifies which queries are slow through:

What's tracked today: Query phase, Fetch phase, Can_match phase durations
The Problem: Operators at times observe that total search latency significantly exceeds the sum of reported phases.

Production Example:

Total search latency: 150ms
├─ query_phase: 40ms (27%)
├─ fetch_phase: 15ms (10%)
├─ can_match_phase: 5ms (3%)
└─ UNACCOUNTED: 90ms (60%)

The Pain Point

In the above mentioned production example 60% of latency is invisible. Operators cannot answer:

  • Why is my query taking 150ms when phases only account for 60ms?
  • Is it queue saturation? Network delays? Coordination overhead?
  • Where should I optimize first?

Current workflow (without breakdown):

  1. See 150ms query in Query Insights Top N
  2. Collect and analyze resource utilization metrics, thread dumps, or flame graphs, or JFR profiles (hours of work, needs deep expertise)
  3. Experiment with config/sharding/setup/query changes (trial and error)
  4. Hope the change helps

Impact: Performance optimization becomes guesswork without clarity which request workflow segment/component is adding most to the search request latency. MTTR for latency issues measured in days/weeks and at times requiring tenured search developers to identify and address latency issues.

Describe the solution you'd like

Proposed Changes

Add comprehensive latency breakdown to Query Insights showing exactly where time is spent:

What We'll Track

Phases (enhanced existing): Query, Fetch, Can_match, DFS, Suggest, Rewrite, Expand

Aggregations (new): Aggregation execution (per-shard), Aggregation reduce (coordinator), Aggregation type breakdown (optional)

Coordination (new): Request parsing, shard routing, response aggregation, reduce phase, result processing

Transport (new): Network round-trip time, request serialization/deserialization, response serialization/deserialization

Queuing (new): Search queue wait, get/fetch queue wait, thread scheduling

Cache (new): Query cache lookup/write, request cache lookup/write, shard request cache, fielddata operations

Concurrency (new): Lock acquisition, shard lock contention, circuit breaker checks

Concurrent Segment Search (new, if enabled): Slice creation and distribution, Slice execution scheduling, Per-slice execution time (min, max, avg, variance), Slice result aggregation, Thread pool coordination overhead

Cluster State & Metadata (new): Cluster state lookup, index metadata access, mapping lookup

Plugins & Extensions (new): Security plugin overhead, custom plugin processing, extension processing

Prioritization for Phased Implementation

When implementing in phases, the priority order can be as follows:

  • Phase 1 (Must-Have): Coordination , Queuing, Concurrent Segment Search
  • Phase 2 (High Value): Cache operations, Concurrency , Cluster state & metadata
  • Phase 3 (Nice-to-Have): Plugins & extensions, Transport , Phases (enhanced tracking)

Query Insights API - Before and After Examples

API Request

GET /_insights/top_queries?type=latency&size=10&from=2026-02-12T00:00:00Z&to=2026-02-12T23:59:59Z

Example 1: Query WITHOUT Aggregations

BEFORE: Current Query Insights Response

What you see today (limited visibility):

{
  "top_queries": [
    {
      "timestamp": 1707764400000,
      "node_id": "node-1",
      "indices": ["content"],
      "search_type": "query_then_fetch",
      "total_shards": 220,
      "source": {
        "query": {
          "bool": {
            "must": [
              {"match": {"content": "search term"}},
              {"exists": {"field": "category"}}
            ]
          }
        },
        "size": 100,
        "from": 0
      },
      "took": 150,
      "phase_latency_breakdown": {
        "query": 40,
        "fetch": 15,
        "can_match": 5
      },
      "resource_usages": {
        "cpu": 45,
        "memory": 1024000
      },
      "user": "admin",
      "labels": {
        "team": "analytics",
        "priority": "high"
      }
    }
  ]
}

Problem: You see 150ms total latency with only 60ms accounted for in phases (query: 40ms, fetch: 15ms, can_match: 5ms). 90ms (60%) is unaccounted for. You know the query is slow, but not why.
The Gap: Where is the missing 90ms?

  • Queue wait times❓
  • Network delays❓
  • Coordination overhead❓
  • Cache operations❓
  • Concurrent segment search overhead❓

AFTER: Enhanced Query Insights Response

What you'll see (improved visibility):

{
  "top_queries": [
    {
      "timestamp": 1707764400000,
      "node_id": "node-1",
      "indices": ["content"],
      "search_type": "query_then_fetch",
      "total_shards": 220,
      "source": {
        "query": {
          "bool": {
            "must": [
              {"match": {"content": "search term"}},
              {"exists": {"field": "category"}}
            ]
          }
        },
        "size": 100,
        "from": 0
      },
      "took": 150,
      "resource_usages": {
        "cpu": 45,
        "memory": 1024000
      },
      "user": "admin",
      "labels": {
        "team": "analytics",
        "priority": "high"
      },
      "latency_breakdown": {
        "phases": {
          "query": 40,
          "fetch": 15,
          "can_match": 5,
          "dfs": 0,
          "suggest": 0,
          "rewrite": 2,
          "expand": 1
        },
        "aggregations": {
          "enabled": false,
          "aggregation_execution": 0,
          "aggregation_reduce": 0
        },
        "coordination": {
          "request_parsing": 2,
          "shard_routing": 3,
          "response_aggregation": 6,
          "reduce_phase": 4,
          "result_processing": 3
        },
        "transport": {
          "network_roundtrip": 25,
          "request_serialization": 2,
          "request_deserialization": 1,
          "response_serialization": 2,
          "response_deserialization": 2
        },
        "queuing": {
          "search_queue_wait": 35,
          "get_fetch_queue_wait": 8,
          "thread_scheduling": 2
        },
        "cache": {
          "query_cache_lookup": 1,
          "query_cache_write": 0,
          "request_cache_lookup": 2,
          "request_cache_write": 0,
          "shard_request_cache": 1,
          "fielddata_operations": 1
        },
        "concurrency": {
          "lock_acquisition": 1,
          "shard_lock_contention": 0,
          "circuit_breaker_checks": 0
        },
        "concurrent_segment_search": {
          "enabled": true,
          "slice_creation_and_distribution": 2,
          "slice_execution_scheduling": 3,
          "per_slice_execution": {
            "slice_count": 2,
            "min_execution_time": 35,
            "max_execution_time": 42,
            "avg_execution_time": 38.5,
            "variance": 7,
            "individual_slices": [
              {
                "slice_id": 0,
                "execution_time": 35,
                "segments_processed": 8
              },
              {
                "slice_id": 1,
                "execution_time": 42,
                "segments_processed": 10
              }
            ]
          },
          "slice_result_aggregation": 5,
          "thread_pool_coordination": 2
        },
        "cluster_state_metadata": {
          "cluster_state_lookup": 1,
          "index_metadata_access": 1,
          "mapping_lookup": 1
        },
        "plugins_extensions": {
          "security_plugin_overhead": 2,
          "custom_plugin_processing": 0,
          "extension_processing": 0
        },
        "other": 1
      }
    }
  ]
}

Actionable Insights:

  1. Queue Saturation (35ms wait, 23% of total) → Increase search thread pool or add data nodes or query optimization
  2. Network Overhead (25ms roundtrip, 17% of total) → Investigate network topology or capacity saturation
  3. Concurrent Segment Search (slice count = 2 validated) → Low aggregation overhead (5ms) confirms optimization
  4. No Aggregations → Confirms aggregations not contributing to latency

Example 2: Query WITH Aggregations

BEFORE: Current Query Insights Response

What you see today (limited visibility):

{
  "top_queries": [
    {
      "timestamp": 1707764500000,
      "node_id": "node-2",
      "indices": ["logs-2026.02.*"],
      "search_type": "query_then_fetch",
      "total_shards": 120,
      "source": {
        "query": {
          "bool": {
            "filter": [
              {"range": {"@timestamp": {"gte": "2026-02-12T00:00:00", "lte": "2026-02-12T23:59:59"}}},
              {"term": {"status": "error"}}
            ]
          }
        },
        "aggs": {
          "error_types": {
            "terms": {
              "field": "error_type.keyword",
              "size": 100
            },
            "aggs": {
              "hourly_distribution": {
                "date_histogram": {
                  "field": "@timestamp",
                  "fixed_interval": "1h"
                },
                "aggs": {
                  "avg_response_time": {
                    "avg": {"field": "response_time"}
                  }
                }
              }
            }
          }
        },
        "size": 0
      },
      "took": 320,
      "phase_latency_breakdown": {
        "query": 120,
        "fetch": 25,
        "can_match": 8
      },
      "resource_usages": {
        "cpu": 180,
        "memory": 2048000
      },
      "user": "analytics_team",
      "labels": {
        "team": "observability",
        "priority": "medium"
      }
    }
  ]
}

Problem: You see 320ms total latency with only 153ms accounted for in phases (query: 120ms, fetch: 25ms, can_match: 8ms). 167ms (52%) is unaccounted for. The query phase shows 120ms, but you don't know how much of that is aggregation execution vs. query/filter execution.
The Gap: Where is the missing 167ms? And within the 120ms query phase, how much is aggregation overhead?

AFTER: Enhanced Query Insights Response

What you'll see (complete visibility with aggregation breakdown):

{
  "top_queries": [
    {
      "timestamp": 1707764500000,
      "node_id": "node-2",
      "indices": ["logs-2026.02.*"],
      "search_type": "query_then_fetch",
      "total_shards": 120,
      "source": {
        "query": {
          "bool": {
            "filter": [
              {"range": {"@timestamp": {"gte": "2026-02-12T00:00:00", "lte": "2026-02-12T23:59:59"}}},
              {"term": {"status": "error"}}
            ]
          }
        },
        "aggs": {
          "error_types": {
            "terms": {
              "field": "error_type.keyword",
              "size": 100
            },
            "aggs": {
              "hourly_distribution": {
                "date_histogram": {
                  "field": "@timestamp",
                  "fixed_interval": "1h"
                },
                "aggs": {
                  "avg_response_time": {
                    "avg": {"field": "response_time"}
                  }
                }
              }
            }
          }
        },
        "size": 0
      },
      "took": 320,
      "resource_usages": {
        "cpu": 180,
        "memory": 2048000
      },
      "user": "analytics_team",
      "labels": {
        "team": "observability",
        "priority": "medium"
      },
      "latency_breakdown": {
        "phases": {
          "query": 25,
          "fetch": 25,
          "can_match": 8,
          "dfs": 0,
          "suggest": 0,
          "rewrite": 3,
          "expand": 0
        },
        "aggregations": {
          "enabled": true,
          "aggregation_execution": 180,
          "aggregation_reduce": 35,
          "type_breakdown": {
            "terms": 120,
            "date_histogram": 45,
            "metrics": 15
          }
        },
        "coordination": {
          "request_parsing": 3,
          "shard_routing": 4,
          "response_aggregation": 8,
          "reduce_phase": 35,
          "result_processing": 5
        },
        "transport": {
          "network_roundtrip": 28,
          "request_serialization": 3,
          "request_deserialization": 2,
          "response_serialization": 4,
          "response_deserialization": 3
        },
        "queuing": {
          "search_queue_wait": 15,
          "get_fetch_queue_wait": 5,
          "thread_scheduling": 2
        },
        "cache": {
          "query_cache_lookup": 2,
          "query_cache_write": 0,
          "request_cache_lookup": 0,
          "request_cache_write": 0,
          "shard_request_cache": 1,
          "fielddata_operations": 8
        },
        "concurrency": {
          "lock_acquisition": 1,
          "shard_lock_contention": 0,
          "circuit_breaker_checks": 1
        },
        "concurrent_segment_search": {
          "enabled": true,
          "slice_creation_and_distribution": 2,
          "slice_execution_scheduling": 3,
          "per_slice_execution": {
            "slice_count": 2,
            "min_execution_time": 22,
            "max_execution_time": 28,
            "avg_execution_time": 25,
            "variance": 6,
            "individual_slices": [
              {
                "slice_id": 0,
                "execution_time": 22,
                "segments_processed": 12
              },
              {
                "slice_id": 1,
                "execution_time": 28,
                "segments_processed": 15
              }
            ]
          },
          "slice_result_aggregation": 4,
          "thread_pool_coordination": 2
        },
        "cluster_state_metadata": {
          "cluster_state_lookup": 1,
          "index_metadata_access": 2,
          "mapping_lookup": 2
        },
        "plugins_extensions": {
          "security_plugin_overhead": 3,
          "custom_plugin_processing": 0,
          "extension_processing": 0
        },
        "other": 2
      }
    }
  ]
}

Actionable Insights:

  1. Aggregations Dominate (215ms, 67% of total took time) → Primary optimization target
  2. Terms Aggregation (120ms, 38% of total) → Action: Reduce cardinality (evaluate composite aggregation, add filters, or reduce size from 100)
  3. Date Histogram (45ms, 14% of total) → Action: Consider larger fixed_interval (e.g., 4h instead of 1h)
  4. Aggregation Reduce (35ms, 11% of total) → Action: Reduce shard count or optimize aggregation structure
  5. Fielddata Operations (8ms) → Status: Acceptable for terms aggregation on keyword field

Before vs. After:

  • Before: "Query is slow at 320ms, query phase shows 120ms but unclear why"
  • After: "Aggregations account for 215ms (67%); terms aggregation is the bottleneck at 120ms"
  • Time to Resolution: Minutes instead of hours with targeted aggregation optimization

Configuration to Enable Breakdown

Configuration

# Enable breakdown tracking
plugins.query_insights.latency_breakdown.enabled: true

# Detail level: basic (8 timers) | detailed (16 timers) | debug (24 timers)
plugins.query_insights.latency_breakdown.level: detailed

# Only track queries exceeding threshold
plugins.query_insights.latency_breakdown.threshold: 100  # ms

Dynamic Update:

PUT /_cluster/settings{
  "persistent": {
    "plugins.query_insights.latency_breakdown.enabled": true,
    "plugins.query_insights.latency_breakdown.level": "detailed",
    "plugins.query_insights.latency_breakdown.threshold": 100
  }
}

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

Impact and Benefits

This framework transforms root cause analysis from a days-or-weeks long process of guesswork and complex diagnostics (flame graphs, thread dumps) into a data-driven operation completed in minutes-or-hours. By providing precise visibility, it enables teams to meet stringent (e.g. p99 SLA) performance requirements and optimize costs by identifying configuration-based fixes—like cache tuning and query optimization —over expensive cluster scaling. It directly implements the Visibility theme (#8879)

Integration and Future Work

Built on the existing Query Insights (#11522) and Top N (#11904) infrastructure, this backward-compatible update complements current resource tracking and exporters without API breaking changes. It serves as the foundation for automated recommendations (#12292)—such as thread pool adjustments—and upcoming Dashboard visualizations, ensuring that deep latency breakdowns are accessible to both exporters and end-users.

Proposed Next Steps

  1. Community Feedback: Validate approach and components
  2. Detailed Design: HLD with instrumentation strategy
  3. POC: Implement subset to validate overhead
  4. Phased Implementation:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions