Dashboard improvement: Use rate() for reconciliation error metrics in top overview panel

## Problem

The default Grafana dashboard shows reconciliation errors in the top overview panel using raw counters from `controller_runtime_reconcile_total{result="error"}`. Since these are monotonically increasing counters, they don't provide a clear view of the current error rate, making it difficult to detect active reconciliation issues.

### Current behavior

The "Reconcile errors" panel (grid position: top row, after operator status) uses these queries:

```promql
# Backup reconciliation errors
controller_runtime_reconcile_total{namespace=~"$operatorNamespace", result="error", controller="backup"}

# Cluster reconciliation errors  
controller_runtime_reconcile_total{namespace=~"$operatorNamespace", result="error", controller="cluster"}

# Pooler reconciliation errors
controller_runtime_reconcile_total{namespace=~"$operatorNamespace", result="error", controller="pooler"}

# Scheduled Backup reconciliation errors
controller_runtime_reconcile_total{namespace=~"$operatorNamespace", result="error", controller=~"scheduledbackup|scheduled-backup"}
```

These cumulative counters make it hard to distinguish between:

Old errors that occurred days ago
New errors happening right now
The panel currently maps the cumulative values to different scopes (Backup=1-9, Cluster=10-99, Pooler=100-999, Scheduled Backup=1000-9999) but this doesn't show if errors are currently occurring.

### Expected behavior
The dashboard should show the rate of errors to help operators identify active reconciliation problems.

### Proposed solution
Use rate() function in the queries to show errors per time window:

Replace:
```promql
controller_runtime_reconcile_total{namespace=~"$operatorNamespace", result="error", controller="cluster"}
```
With:
```promql
rate(controller_runtime_reconcile_total{namespace=~"$operatorNamespace", result="error", controller="cluster"}[5m]) > 0
```

Or use increase() for count over time window:
```promql
increase(controller_runtime_reconcile_total{namespace=~"$operatorNamespace", result="error", controller="cluster"}[5m])
```

The > 0 filter ensures the panel only shows active errors, making it immediately visible when reconciliation issues occur.

## Benefits
✅ Shows active error rate instead of cumulative count
✅ Makes alerts more meaningful (alert when rate > threshold)
✅ Easier to correlate errors with recent changes/events
✅ Clear visual indication when reconciliation problems start/stop
✅ Better observability for production troubleshooting

## Environment
CloudNative-PG version: 1.27.0
Kubernetes version: 1.28+
Monitoring stack: VictoriaMetrics/Prometheus + Grafana
Dashboard location: Top overview panel "Reconcile errors"

## Additional context
We've implemented this improvement in our custom dashboard and it significantly improved our ability to detect and respond to reconciliation issues in real-time. The panel now clearly shows when active errors are occurring by scope (Backup/Cluster/Pooler/Scheduled Backup), rather than just showing a cumulative count.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dashboard improvement: Use rate() for reconciliation error metrics in top overview panel #48

Problem

Current behavior

Expected behavior

Proposed solution

Benefits

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dashboard improvement: Use rate() for reconciliation error metrics in top overview panel #48

Description

Problem

Current behavior

Expected behavior

Proposed solution

Benefits

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions