Skip to content

Commit 0a2d9de

Browse files
committed
openshift-state-metrics promcat
1 parent 8f0498e commit 0a2d9de

File tree

9 files changed

+666
-0
lines changed

9 files changed

+666
-0
lines changed
Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
# Alerts
2+
## KubeCPUOvercommit
3+
Cluster has overcommitted CPU resource requests for Pods and cannot tolerate node failure.
4+
5+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit)
6+
7+
## KubeMemOvercommit
8+
Cluster has overcommitted memory resource requests for Pods and cannot tolerate node failure.
9+
10+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememovercommit)
11+
12+
## KubeCPUOvercommit
13+
Cluster has overcommitted CPU resource requests for Namespaces.
14+
15+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit)
16+
17+
## KubeMemOvercommit
18+
Cluster has overcommitted memory resource requests for Namespaces.
19+
20+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememovercommit)
21+
22+
## KubeQuotaExceeded
23+
Namespace exceeded its quota.
24+
25+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubequotaexceeded)
26+
27+
## CPUThrottlingHigh
28+
Throttling of CPU in namespace for container in pod.
29+
30+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-cputhrottlinghigh)
31+
32+
## KubePersistentVolumeUsageCritical
33+
The PersistentVolume claimed in Namespace usage is critical.
34+
35+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeusagecritical)
36+
37+
## KubePersistentVolumeFullInFourDays
38+
Based on recent sampling, the PersistentVolume claimed in the Namespace is expected to fill up within four days
39+
40+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumefullinfourdays)
41+
42+
## KubePersistentVolumeErrors
43+
The persistent volume has bad status.
44+
45+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeerrors)
46+
47+
## KubeVersionMismatch
48+
There are different semantic versions of Kubernetes components running.
49+
50+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeversionmismatch)
51+
52+
## KubeClientErrors
53+
Kubernetes API server client is experiencing errors.'
54+
55+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors)
56+
57+
## ErrorBudgetBurn
58+
High requests error budget burn for job=kube-apiserver
59+
60+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-errorbudgetburn)
61+
62+
## ErrorBudgetBurn
63+
High requests error budget burn for job=kube-apiserver
64+
65+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-errorbudgetburn)
66+
67+
## KubeAPILatencyHigh
68+
The API server has an abnormal latency.
69+
70+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh)
71+
72+
## KubeAPILatencyHigh
73+
The API server has a 99th percentile latency.
74+
75+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh)
76+
77+
## KubeAPIErrorsHigh
78+
API server is returning high number errors.
79+
80+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh)
81+
82+
## KubeClientCertificateExpiration
83+
A client certificate used to authenticate to the apiserver is expiring in less than 7.0 days.
84+
85+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration)
86+
87+
## KubeClientCertificateExpiration
88+
A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours.
89+
90+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration)
91+
92+
## AggregatedAPIErrors
93+
An aggregated API has reported errors. The number of errors have increased for it in the past five minutes. High values indicate that the availability of the service changes too often.
94+
95+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-aggregatedapierrors)
96+
97+
## AggregatedAPIDown
98+
An aggregated API is down. It has not been available at least for the past five minutes.
99+
100+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-aggregatedapidown)
101+
102+
## KubeAPIDown
103+
KubeAPI has disappeared from Prometheus target discovery.
104+
105+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapidown)
106+
107+
## KubeNodeNotReady
108+
One node has been unready for more than 15 minutes.
109+
110+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodenotready)
111+
112+
## KubeNodeUnreachable
113+
One node is unreachable and some workloads may be rescheduled.
114+
115+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodeunreachable)
116+
117+
## KubeletTooManyPods
118+
Kubelet is running out of its Pod capacity.
119+
120+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubelettoomanypods)
121+
122+
## KubeNodeReadinessFlapping
123+
The readiness status of node has changed the value several times in the last 15 minutes.
124+
125+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodereadinessflapping)
126+
127+
## KubeletPlegDurationHigh
128+
The Kubelet Pod Lifecycle Event Generator has a 99th percentile duration of {{ $value }} seconds on node {{ $labels.node }}.
129+
130+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletplegdurationhigh)
131+
132+
## KubeletPodStartUpLatencyHigh
133+
Kubelet Pod startup 99th percentile latency is high.
134+
135+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletpodstartuplatencyhigh)
136+
137+
## KubeletDown
138+
Kubelet has disappeared from Prometheus target discovery.
139+
140+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletdown)
141+
142+
## KubeSchedulerDown
143+
KubeScheduler has disappeared from Prometheus target discovery.
144+
145+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeschedulerdown)
146+
147+
## KubeControllerManagerDown
148+
KubeControllerManager has disappeared from Prometheus target discovery.
149+
150+
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontrollermanagerdown)
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Gather the metrics from the prometheus deployed by Openshift
2+
3+
Metrics are automatically gathered by Prometheus Cluster Monitoring, you can query them in the Prometheus built-in console
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Openshift-state-metrics
2+
Red Hat® OpenShift® state metrics is a expansion upon kube-state-metrics adding specific OpenShift® resource metrics
3+
4+
Openshift provides a prometheus with Openshift-state-metrics but doesn't provide any dashboard with this information.
5+
You can gather the metrics with our agent and show all metrics in our dashboards or even in the grafana dashboards with
6+
the Sysdig datasource as a Prometheus datasource.
7+
8+
# Metrics
9+
The metrics gives you the information about the following:
10+
- BuildConfig
11+
- DeploymentConfig
12+
- ClusterResourceQuotas
13+
- Route
14+
- Group
15+
16+
# Attributions
17+
The configuration files and dashboards are maintained by [Sysdig team](https://sysdig.com/).
18+
19+
All the metrics are maintained by [OpenShift-state-metrics](https://github.com/openshift/openshift-state-metrics).
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
The recording rules for Sysdig, just download it and apply it.
2+
```
3+
kubectl apply -f rules.yaml
4+
```

0 commit comments

Comments
 (0)