-
Notifications
You must be signed in to change notification settings - Fork 159
Open
Description
By default, ztunnel exposes metrics at /stats/prometheus including source_workload and destination_workload labels for metrics:
- istio_tcp_received_bytes_total
- istio_tcp_connections_closed_total
- istio_tcp_connections_opened_total
- istio_tcp_sent_bytes_total
- istio_on_demand_dns_total
In my environment, these labels are populated with specific pod names.
Description:
Istio Version: 1.28.2
I am facing two critical issues:
- Stale metrics retention: Ztunnel does not evict metrics for pods that have been deleted. Over time, as pods rotate, the cardinality of metrics grows indefinitely. This eventually causes the metrics response size to exceed the Prometheus max_scrape_size, leading to scrape failures and loss of observability.
- Telemetry API ignored: I attempted to mitigate this by applying a Telemetry resource to drop or override these high-cardinality labels (using tag_overrides). However, ztunnel seems to ignore these configurations, continuing to export raw pod names in the workload labels. Additionally, there appears to be no internal ztunnel configuration to toggle these labels off.
Impact:
Metrics collection completely breaks on nodes with long uptime or high pod turnover. The only current workaround is a manual restart of the ztunnel DaemonSet.
Expected behavior:
Eviction: Ztunnel should automatically purge metrics associated with workloads that are no longer present in its xDS/Workload state or have retention period
Configuration: Ztunnel should respect Telemetry API configurations for label dropping/overriding
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels