Skip to content

Commit 3dbb73e

Browse files
committed
openshift-state-metrics promcat
1 parent 0a2d9de commit 3dbb73e

File tree

11 files changed

+1116
-607
lines changed

11 files changed

+1116
-607
lines changed

apps/openshift-state-metrics.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
---
2+
apiVersion: v1
3+
kind: App
4+
name: "openshift-state-metrics"
5+
keywords:
6+
- Platform
7+
- OpenShift
8+
- Kubernetes
9+
- Available
10+
availableVersions:
11+
- '4.7'
12+
shortDescription: "Specific metrics for OpenShift"
13+
description: |
14+
openshift-state-metrics expands upon kube-state-metrics by adding metrics for OpenShift specific resources
15+
icon: https://raw.githubusercontent.com/sysdiglabs/promcat-resources/master/apps/images/openshift.png
16+
website: https://github.com/openshift/openshift-state-metrics
17+
available: true
Lines changed: 12 additions & 144 deletions
Original file line numberDiff line numberDiff line change
@@ -1,150 +1,18 @@
11
# Alerts
2-
## KubeCPUOvercommit
3-
Cluster has overcommitted CPU resource requests for Pods and cannot tolerate node failure.
2+
## [OpenShift-state-metrics] CPU Resource Request Quota Usage
3+
Resource request CPU usage is over 90% resource quota.
44

5-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit)
5+
## [OpenShift-state-metrics] CPU Resource Limit Quota Usage
6+
Resource limit CPU usage is over 90% resource limit quota.
67

7-
## KubeMemOvercommit
8-
Cluster has overcommitted memory resource requests for Pods and cannot tolerate node failure.
8+
## [OpenShift-state-metrics] Memory Resource Request Quota Usage
9+
Resource request memory usage is over 90% resource quota.
910

10-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememovercommit)
11+
## [OpenShift-state-metrics] Memory Resource Limit Quota Usage
12+
Resource limit memory usage is over 90% resource limit quota.
1113

12-
## KubeCPUOvercommit
13-
Cluster has overcommitted CPU resource requests for Namespaces.
14+
## [OpenShift-state-metrics] Routes with issues
15+
A route status is in error and is having issues.
1416

15-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit)
16-
17-
## KubeMemOvercommit
18-
Cluster has overcommitted memory resource requests for Namespaces.
19-
20-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememovercommit)
21-
22-
## KubeQuotaExceeded
23-
Namespace exceeded its quota.
24-
25-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubequotaexceeded)
26-
27-
## CPUThrottlingHigh
28-
Throttling of CPU in namespace for container in pod.
29-
30-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-cputhrottlinghigh)
31-
32-
## KubePersistentVolumeUsageCritical
33-
The PersistentVolume claimed in Namespace usage is critical.
34-
35-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeusagecritical)
36-
37-
## KubePersistentVolumeFullInFourDays
38-
Based on recent sampling, the PersistentVolume claimed in the Namespace is expected to fill up within four days
39-
40-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumefullinfourdays)
41-
42-
## KubePersistentVolumeErrors
43-
The persistent volume has bad status.
44-
45-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeerrors)
46-
47-
## KubeVersionMismatch
48-
There are different semantic versions of Kubernetes components running.
49-
50-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeversionmismatch)
51-
52-
## KubeClientErrors
53-
Kubernetes API server client is experiencing errors.'
54-
55-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors)
56-
57-
## ErrorBudgetBurn
58-
High requests error budget burn for job=kube-apiserver
59-
60-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-errorbudgetburn)
61-
62-
## ErrorBudgetBurn
63-
High requests error budget burn for job=kube-apiserver
64-
65-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-errorbudgetburn)
66-
67-
## KubeAPILatencyHigh
68-
The API server has an abnormal latency.
69-
70-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh)
71-
72-
## KubeAPILatencyHigh
73-
The API server has a 99th percentile latency.
74-
75-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh)
76-
77-
## KubeAPIErrorsHigh
78-
API server is returning high number errors.
79-
80-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh)
81-
82-
## KubeClientCertificateExpiration
83-
A client certificate used to authenticate to the apiserver is expiring in less than 7.0 days.
84-
85-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration)
86-
87-
## KubeClientCertificateExpiration
88-
A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours.
89-
90-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration)
91-
92-
## AggregatedAPIErrors
93-
An aggregated API has reported errors. The number of errors have increased for it in the past five minutes. High values indicate that the availability of the service changes too often.
94-
95-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-aggregatedapierrors)
96-
97-
## AggregatedAPIDown
98-
An aggregated API is down. It has not been available at least for the past five minutes.
99-
100-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-aggregatedapidown)
101-
102-
## KubeAPIDown
103-
KubeAPI has disappeared from Prometheus target discovery.
104-
105-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapidown)
106-
107-
## KubeNodeNotReady
108-
One node has been unready for more than 15 minutes.
109-
110-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodenotready)
111-
112-
## KubeNodeUnreachable
113-
One node is unreachable and some workloads may be rescheduled.
114-
115-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodeunreachable)
116-
117-
## KubeletTooManyPods
118-
Kubelet is running out of its Pod capacity.
119-
120-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubelettoomanypods)
121-
122-
## KubeNodeReadinessFlapping
123-
The readiness status of node has changed the value several times in the last 15 minutes.
124-
125-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodereadinessflapping)
126-
127-
## KubeletPlegDurationHigh
128-
The Kubelet Pod Lifecycle Event Generator has a 99th percentile duration of {{ $value }} seconds on node {{ $labels.node }}.
129-
130-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletplegdurationhigh)
131-
132-
## KubeletPodStartUpLatencyHigh
133-
Kubelet Pod startup 99th percentile latency is high.
134-
135-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletpodstartuplatencyhigh)
136-
137-
## KubeletDown
138-
Kubelet has disappeared from Prometheus target discovery.
139-
140-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletdown)
141-
142-
## KubeSchedulerDown
143-
KubeScheduler has disappeared from Prometheus target discovery.
144-
145-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeschedulerdown)
146-
147-
## KubeControllerManagerDown
148-
KubeControllerManager has disappeared from Prometheus target discovery.
149-
150-
[Runbook](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontrollermanagerdown)
17+
## [OpenShift-state-metrics] Buid Processes with issues
18+
A build process is in error or failed status.
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
# Gather the metrics from the prometheus deployed by Openshift
22

33
Metrics are automatically gathered by Prometheus Cluster Monitoring, you can query them in the Prometheus built-in console
4+
5+

resources/openshift-state-metrics/README.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,10 @@ the Sysdig datasource as a Prometheus datasource.
77

88
# Metrics
99
The metrics gives you the information about the following:
10+
- ClusterResourceQuotas
1011
- BuildConfig
1112
- DeploymentConfig
12-
- ClusterResourceQuotas
13-
- Route
14-
- Group
13+
- Routes
1514

1615
# Attributions
1716
The configuration files and dashboards are maintained by [Sysdig team](https://sysdig.com/).

resources/openshift-state-metrics/RECORDING-RULES.md

Lines changed: 0 additions & 4 deletions
This file was deleted.

0 commit comments

Comments
 (0)