Skip to content

Commit ffceadf

Browse files
machine424openshift-cherrypick-robot
authored andcommitted
chore(jsonnet): use prometheus_remote_storage_queue_highest_timestamp_in_seconds in PrometheusRemoteWriteBehind
This metric was introduced in openshift/prometheus#262 and related PRs. Dashboard expressions are not changed, since updating them may be more complex. Fixing the alert is more important and we can always revisit that if it causes confusion. On main, dashboards will be adjusted later once the jsonnet dependencies are updated.
1 parent b9fd02f commit ffceadf

File tree

2 files changed

+14
-2
lines changed

2 files changed

+14
-2
lines changed

assets/prometheus-k8s/prometheus-rule.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -166,11 +166,12 @@ spec:
166166
description: Prometheus {{$labels.namespace}}/{{$labels.pod}} remote write is {{ printf "%.1f" $value }}s behind for {{ $labels.remote_name}}:{{ $labels.url }}.
167167
summary: Prometheus remote write is behind.
168168
expr: |
169+
# Use the metric added in https://github.com/openshift/prometheus/pull/262 and related PRs.
169170
# Without max_over_time, failed scrapes could create false negatives, see
170171
# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.
171172
(
172-
max_over_time(prometheus_remote_storage_highest_timestamp_in_seconds{job=~"prometheus-k8s|prometheus-user-workload"}[5m])
173-
- ignoring(remote_name, url) group_right
173+
max_over_time(prometheus_remote_storage_queue_highest_timestamp_seconds{job=~"prometheus-k8s|prometheus-user-workload"}[5m])
174+
-
174175
max_over_time(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{job=~"prometheus-k8s|prometheus-user-workload"}[5m])
175176
)
176177
> 120

jsonnet/utils/sanitize-rules.libsonnet

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -409,6 +409,17 @@ local patchedRules = [
409409
labels: {
410410
severity: 'info',
411411
},
412+
expr: |||
413+
# Use the metric added in https://github.com/openshift/prometheus/pull/262 and related PRs.
414+
# Without max_over_time, failed scrapes could create false negatives, see
415+
# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.
416+
(
417+
max_over_time(prometheus_remote_storage_queue_highest_timestamp_seconds{job=~"prometheus-k8s|prometheus-user-workload"}[5m])
418+
-
419+
max_over_time(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{job=~"prometheus-k8s|prometheus-user-workload"}[5m])
420+
)
421+
> 120
422+
|||,
412423
},
413424
],
414425
},

0 commit comments

Comments
 (0)