-
Notifications
You must be signed in to change notification settings - Fork 263
Description
Describe the solution you'd like
I would like to include the backend_state as a label in the trident_backend_info metric.
Currently, the only metric that exposes backend_state is trident_backend_count, but this metric is aggregated by backend_type only.
There is no unique label that lets me correlate the backend state with a specific backend_name.
If trident_backend_info could expose a backend_state label (for example: online, failed), it would be possible to identify the exact backend that is in a non-healthy state directly from Prometheus and alerting rules.
Describe alternatives you've considered
• Using trident_backend_count to detect non-healthy backends, but this only shows counts per backend_type and does not map back to individual backend_name, so it is not actionable.
• Relying on tridentctl get backend or logs to check backend status, but this requires manual or out-of-band checks and does not integrate well with centralized monitoring/alerting systems such as Prometheus and Grafana.
Additional context
When a backend goes into a failed state, the metrics only show that there is at least one failed backend of a given backend_type, but there is no way to see which backend_name is affected from metrics alone.

The trident_backend_info metric already includes labels such as backend_name, backend_type, and backend_uuid; adding a backend_state label to this metric would allow building precise alerts and dashboards that highlight the specific backend that is down.
