-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Dashboard Name
Istio Monitoring Dashboard
Expected Dashboard Sections and Panels
(Can be tweaked (add or remove panels/sections) according to available metrics)
General Overview
This section provides a high-level overview of the Istio service mesh's health and performance metrics, enabling quick assessment of the overall system state.
Panels
-
Total Requests
- Description: Displays the total number of requests processed by the service mesh.
-
Request Rate
- Description: Shows the rate of incoming and outgoing requests per second.
-
Average Latency
- Description: Illustrates the average latency of requests within the mesh.
-
Error Rate
- Description: Displays the percentage of failed requests compared to total requests.
Traffic Management
This section focuses on traffic management metrics, providing insights into how traffic is routed, load-balanced, and managed across services.
Panels
-
Request Distribution
- Description: Shows the distribution of requests across different services and versions.
-
Load Balancing Efficiency
- Description: Displays metrics on load balancing effectiveness, such as request distribution fairness.
-
Circuit Breaker Events
- Description: Monitors the number of circuit breaker events triggered, indicating potential service issues.
-
Retries and Timeouts
- Description: Illustrates the number of retry attempts and timeout occurrences in request handling.
Performance Metrics
This section provides detailed performance metrics to monitor the responsiveness and efficiency of services managed by Istio.
Panels
-
Request Latency Percentiles
- Description: Displays latency percentiles (e.g., p50, p95, p99) for requests, highlighting performance bottlenecks.
-
Throughput
- Description: Shows the number of requests processed per unit of time, indicating service capacity.
-
Service Response Times
- Description: Illustrates the response times of individual services within the mesh.
-
Resource Utilization per Service
- Description: Monitors CPU and memory usage for each service, identifying resource constraints.
Error Metrics
This section monitors errors and failures within the Istio service mesh, aiding in the troubleshooting and resolution of issues.
Panels
-
HTTP Error Rates
- Description: Displays the rate of HTTP errors (e.g., 4xx, 5xx) across services.
-
gRPC Error Rates
- Description: Shows the rate of gRPC errors encountered in service communications.
-
Failed Requests by Service
- Description: Monitors the number of failed requests attributed to each service.
-
TLS Handshake Failures
- Description: Illustrates the number of TLS handshake failures, indicating potential security or configuration issues.
Resource Usage
This section provides insights into the resource consumption of Istio components, ensuring efficient operation within the Kubernetes cluster.
Panels
-
CPU Usage
- Description: Displays CPU usage for Istio control plane and data plane components.
-
Memory Usage
- Description: Shows memory consumption of Istio components, helping identify potential memory leaks or inefficiencies.
-
Pod Restarts
- Description: Monitors the number of restarts for Istio pods, indicating stability issues.
-
Disk I/O
- Description: Illustrates disk input/output metrics for Istio components, highlighting storage performance.
Control Plane Metrics
This section tracks metrics specific to the Istio control plane, ensuring the configuration and management layers are functioning correctly.
Panels
-
Pilot Configuration Syncs
- Description: Displays the number of configuration synchronizations performed by Pilot.
-
Mixer Requests
- Description: Shows the number of requests handled by Mixer for policy and telemetry.
-
Galley Operations
- Description: Monitors the operations and performance of Galley, Istio's configuration validation component.
-
Citadel Certificate Issuance
- Description: Illustrates the number of certificates issued by Citadel for mutual TLS.
Data Plane Metrics
This section provides metrics related to the data plane, focusing on the performance and reliability of sidecar proxies handling service traffic.
Panels
-
Envoy Proxy Metrics
- Description: Displays key metrics from Envoy proxies, such as active connections and request rates.
-
Inbound and Outbound Traffic
- Description: Shows the volume of inbound and outbound traffic handled by data plane proxies.
-
Proxy CPU and Memory Usage
- Description: Monitors resource consumption of Envoy proxies to identify performance issues.
-
Connection Errors
- Description: Illustrates the number of connection errors encountered by data plane proxies.
Security Metrics
This section tracks security-related metrics, ensuring that Istio's security features are effectively protecting the service mesh.
Panels
-
Mutual TLS Usage
- Description: Displays the number of connections secured with mutual TLS.
-
Authorization Policy Enforcement
- Description: Shows metrics related to the enforcement of authorization policies, including allowed and denied requests.
-
Certificate Expirations
- Description: Monitors the expiration status of certificates managed by Istio, ensuring timely renewals.
-
Security Policy Violations
- Description: Illustrates the number of security policy violations detected within the mesh.
Expected Dashboard Variables
namespaceβ Filter metrics based on the Kubernetes namespace where Istio is deployed.deployment.environment- Environment of application (configured at Otel agent level)service.nameβ Select specific services within the mesh to filter metrics.clusterβ For multi-cluster setups, filter metrics based on the Kubernetes cluster.
References or Screenshots
- Istio Prometheus Metrics Documentation
- Istio Observability Documentation
- Sample Istio Dashboard Screenshot
π Notes
Please review the CONTRIBUTING.md for guidelines on dashboard structure, naming conventions, and how to submit a pull request.