Skip to content

[Dashboard] Istio based on prometheusΒ #6025

@therealpandey

Description

@therealpandey

Dashboard Name

Istio Monitoring Dashboard

Expected Dashboard Sections and Panels

(Can be tweaked (add or remove panels/sections) according to available metrics)

General Overview

This section provides a high-level overview of the Istio service mesh's health and performance metrics, enabling quick assessment of the overall system state.

Panels

  • Total Requests

    • Description: Displays the total number of requests processed by the service mesh.
  • Request Rate

    • Description: Shows the rate of incoming and outgoing requests per second.
  • Average Latency

    • Description: Illustrates the average latency of requests within the mesh.
  • Error Rate

    • Description: Displays the percentage of failed requests compared to total requests.

Traffic Management

This section focuses on traffic management metrics, providing insights into how traffic is routed, load-balanced, and managed across services.

Panels

  • Request Distribution

    • Description: Shows the distribution of requests across different services and versions.
  • Load Balancing Efficiency

    • Description: Displays metrics on load balancing effectiveness, such as request distribution fairness.
  • Circuit Breaker Events

    • Description: Monitors the number of circuit breaker events triggered, indicating potential service issues.
  • Retries and Timeouts

    • Description: Illustrates the number of retry attempts and timeout occurrences in request handling.

Performance Metrics

This section provides detailed performance metrics to monitor the responsiveness and efficiency of services managed by Istio.

Panels

  • Request Latency Percentiles

    • Description: Displays latency percentiles (e.g., p50, p95, p99) for requests, highlighting performance bottlenecks.
  • Throughput

    • Description: Shows the number of requests processed per unit of time, indicating service capacity.
  • Service Response Times

    • Description: Illustrates the response times of individual services within the mesh.
  • Resource Utilization per Service

    • Description: Monitors CPU and memory usage for each service, identifying resource constraints.

Error Metrics

This section monitors errors and failures within the Istio service mesh, aiding in the troubleshooting and resolution of issues.

Panels

  • HTTP Error Rates

    • Description: Displays the rate of HTTP errors (e.g., 4xx, 5xx) across services.
  • gRPC Error Rates

    • Description: Shows the rate of gRPC errors encountered in service communications.
  • Failed Requests by Service

    • Description: Monitors the number of failed requests attributed to each service.
  • TLS Handshake Failures

    • Description: Illustrates the number of TLS handshake failures, indicating potential security or configuration issues.

Resource Usage

This section provides insights into the resource consumption of Istio components, ensuring efficient operation within the Kubernetes cluster.

Panels

  • CPU Usage

    • Description: Displays CPU usage for Istio control plane and data plane components.
  • Memory Usage

    • Description: Shows memory consumption of Istio components, helping identify potential memory leaks or inefficiencies.
  • Pod Restarts

    • Description: Monitors the number of restarts for Istio pods, indicating stability issues.
  • Disk I/O

    • Description: Illustrates disk input/output metrics for Istio components, highlighting storage performance.

Control Plane Metrics

This section tracks metrics specific to the Istio control plane, ensuring the configuration and management layers are functioning correctly.

Panels

  • Pilot Configuration Syncs

    • Description: Displays the number of configuration synchronizations performed by Pilot.
  • Mixer Requests

    • Description: Shows the number of requests handled by Mixer for policy and telemetry.
  • Galley Operations

    • Description: Monitors the operations and performance of Galley, Istio's configuration validation component.
  • Citadel Certificate Issuance

    • Description: Illustrates the number of certificates issued by Citadel for mutual TLS.

Data Plane Metrics

This section provides metrics related to the data plane, focusing on the performance and reliability of sidecar proxies handling service traffic.

Panels

  • Envoy Proxy Metrics

    • Description: Displays key metrics from Envoy proxies, such as active connections and request rates.
  • Inbound and Outbound Traffic

    • Description: Shows the volume of inbound and outbound traffic handled by data plane proxies.
  • Proxy CPU and Memory Usage

    • Description: Monitors resource consumption of Envoy proxies to identify performance issues.
  • Connection Errors

    • Description: Illustrates the number of connection errors encountered by data plane proxies.

Security Metrics

This section tracks security-related metrics, ensuring that Istio's security features are effectively protecting the service mesh.

Panels

  • Mutual TLS Usage

    • Description: Displays the number of connections secured with mutual TLS.
  • Authorization Policy Enforcement

    • Description: Shows metrics related to the enforcement of authorization policies, including allowed and denied requests.
  • Certificate Expirations

    • Description: Monitors the expiration status of certificates managed by Istio, ensuring timely renewals.
  • Security Policy Violations

    • Description: Illustrates the number of security policy violations detected within the mesh.

Expected Dashboard Variables

  • namespace – Filter metrics based on the Kubernetes namespace where Istio is deployed.
  • deployment.environment - Environment of application (configured at Otel agent level)
  • service.name – Select specific services within the mesh to filter metrics.
  • cluster – For multi-cluster setups, filter metrics based on the Kubernetes cluster.

References or Screenshots

πŸ“‹ Notes

Please review the CONTRIBUTING.md for guidelines on dashboard structure, naming conventions, and how to submit a pull request.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions