Skip to content

Commit 753456d

Browse files
author
Matthew Garrell
committed
CNV12285 Adding CNV alerts to downstream documentation
CNV12285 Adding CNV alerts to topic map CNV12285 Misc edits after initial inspection CNV12285 Correcting HCO alerts ID CNV12285 Removing kubectl and CDI_NAMESPACE references from files CNV12285 Fixing incorrectly formatted leveloffset CNV12285 Change in storage alerts -cnv to per SME review. CNV12285 Change in network alerts openshift-cnv to Namespace per SME review. CNV12285 Updates from SME review 9 Feb CNV12285 Updates from SME review 9 Feb 2nd CNV12285 Updates from SME review 9 Feb 3rd CNV12285 Updates from SME review 9 Feb 4th CNV12285 Updates from SME review 9 Feb 5th CNV12285 Build failure test 1 CNV12285 Build failure test 2 CNV12285 Build failure test 3 CNV12285 source:terminal correction CNV12285 Draft rewrite 1 CNV12285 Draft rewrite 2 CNV12285 Adding tech preview snippet CNV12285 Adding tech preview snippet 2 CNV12285 Adding tech preview snippet 3 CNV12285 Draft rewrite 3 CNV12285 Draft rewrite 4 CNV12285 Draft rewrite 5 CNV12285 Draft rewrite 6 CNV12285 Draft rewrite 7 CNV12285 Revisions based on QE review CNV12285 Changes based on QE review 2 CNV12285 Changes based on QE review 3 CNV12285 Changes based on QE review 4 CNV12285 Changes based on QE review 5 CNV12285 Changes based on QE review 6 CNV12285 Changes based on peer review CNV12285 Changes based on peer review 2
1 parent 725c062 commit 753456d

File tree

5 files changed

+819
-0
lines changed

5 files changed

+819
-0
lines changed

_topic_maps/_topic_map.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3222,6 +3222,8 @@ Topics:
32223222
File: virt-prometheus-queries
32233223
- Name: Exposing custom metrics for virtual machines
32243224
File: virt-exposing-custom-metrics-for-vms
3225+
- Name: OpenShift Virtualization critical alerts
3226+
File: virt-virtualization-alerts
32253227
- Name: Collecting OpenShift Virtualization data for Red Hat Support
32263228
File: virt-collecting-virt-data
32273229
Distros: openshift-enterprise

modules/virt-cnv-network-alerts.adoc

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * virt/logging_events_monitoring/virt-events.html/virt-virtualization-alerts.adoc
4+
:_content-type: REFERENCE
5+
[id="virt-cnv-network-alerts_{context}"]
6+
= Network alerts
7+
8+
Network alerts provide information about problems for the {VirtProductName} Network Operator.
9+
10+
//KubeMacPoolDown Alert
11+
[id="KubeMacPoolDown_{context}"]
12+
== KubeMacPoolDown alert
13+
14+
.Description
15+
16+
The KubeMacPool component allocates MAC addresses and prevents MAC address conflicts.
17+
18+
.Reason
19+
20+
If the KubeMacPool-manager pod is down, then the creation of `VirtualMachine` objects fails.
21+
22+
.Troubleshoot
23+
24+
. Determine the Kubemacpool-manager pod namespace and name.
25+
+
26+
[source,terminal]
27+
----
28+
$ export KMP_NAMESPACE="$(oc get pod -A --no-headers -l control-plane=mac-controller-manager | awk '{print $1}')"
29+
----
30+
+
31+
[source,terminal]
32+
----
33+
$ export KMP_NAME="$(oc get pod -A --no-headers -l control-plane=mac-controller-manager | awk '{print $2}')"
34+
----
35+
36+
. Check the Kubemacpool-manager pod description and logs to determine the source of the problem.
37+
+
38+
[source,terminal]
39+
----
40+
$ oc describe pod -n $KMP_NAMESPACE $KMP_NAME
41+
----
42+
+
43+
[source,terminal]
44+
----
45+
$ oc logs -n $KMP_NAMESPACE $KMP_NAME
46+
----
47+
48+
.Resolution
49+
50+
Open a support issue and provide the information gathered in the troubleshooting process.

modules/virt-cnv-ssp-alerts.adoc

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * virt/logging_events_monitoring/virt-events.html/virt-virtualization-alerts.adoc
4+
:_content-type: REFERENCE
5+
[id="virt-cnv-ssp-alerts_{context}"]
6+
= SSP alerts
7+
8+
SSP alerts provide information about problems for the {VirtProductName} SSP Operator.
9+
10+
//SSPFailingToReconcile Alert
11+
[id="SSPFailingToReconcile_{context}"]
12+
== SSPFailingToReconcile alert
13+
14+
.Description
15+
16+
The SSP Operator’s pod is up, but the pod's reconcile cycle consistently fails. This failure includes failure to update the resources for which it is responsible, failure to deploy the template validator, or failure to deploy or update the common templates.
17+
18+
.Reason
19+
20+
If the SSP Operator fails to reconcile, then the deployment of dependent components fails, reconciliation of component changes fails, or both. Additionally, the updates to the common templates and template validator reset and fail.
21+
22+
.Troubleshoot
23+
24+
. Check the ssp-operator pod's logs for errors:
25+
+
26+
[source,terminal]
27+
----
28+
$ export NAMESPACE="$(oc get deployment -A | grep ssp-operator | awk '{print $1}')"
29+
----
30+
+
31+
[source,terminal]
32+
----
33+
$ oc -n $NAMESPACE describe pods -l control-plane=ssp-operator
34+
----
35+
+
36+
[source,terminal]
37+
----
38+
$ oc -n $NAMESPACE logs --tail=-1 -l control-plane=ssp-operator
39+
----
40+
41+
. Verify that the template validator is up. If the template validator is not up, then check the pod’s logs for errors.
42+
+
43+
[source,terminal]
44+
----
45+
$ export NAMESPACE="$($ oc get deployment -A | grep ssp-operator | awk '{print $1}')"
46+
----
47+
+
48+
[source,terminal]
49+
----
50+
$ oc -n $NAMESPACE get pods -l name=virt-template-validator
51+
----
52+
+
53+
[source,terminal]
54+
----
55+
$ oc -n $NAMESPACE describe pods -l name=virt-template-validator
56+
----
57+
+
58+
[source,terminal]
59+
----
60+
$ oc -n $NAMESPACE logs --tail=-1 -l name=virt-template-validator
61+
----
62+
63+
.Resolution
64+
65+
Open a support issue and provide the information gathered in the troubleshooting process.
66+
67+
//SSPOperatorDown Alert
68+
[id="SSPOperatorDown_{context}"]
69+
== SSPOperatorDown alert
70+
71+
.Description
72+
73+
The SSP Operator deploys and reconciles the common templates and the template validator.
74+
75+
.Reason
76+
77+
If the SSP Operator is down, then the deployment of dependent components fails, reconciliation of component changes fails, or both. Additionally, the updates to the common template and template validator reset and fail.
78+
79+
.Troubleshoot
80+
81+
. Check ssp-operator's pod namespace:
82+
+
83+
[source,terminal]
84+
----
85+
$ export NAMESPACE="$(oc get deployment -A | grep ssp-operator | awk '{print $1}')"
86+
----
87+
88+
. Verify that the ssp-operator’s pod is currently down.
89+
+
90+
[source,terminal]
91+
----
92+
$ oc -n $NAMESPACE get pods -l control-plane=ssp-operator
93+
----
94+
95+
. Check the ssp-operator's pod description and logs.
96+
+
97+
[source,terminal]
98+
----
99+
$ oc -n $NAMESPACE describe pods -l control-plane=ssp-operator
100+
----
101+
+
102+
[source,terminal]
103+
----
104+
$ oc -n $NAMESPACE logs --tail=-1 -l control-plane=ssp-operator
105+
----
106+
107+
.Resolution
108+
109+
Open a support issue and provide the information gathered in the troubleshooting process.
110+
111+
//SSPTemplateValidatorDown Alert
112+
[id="SSPTemplateValidatorDown_{context}"]
113+
== SSPTemplateValidatorDown alert
114+
115+
.Description
116+
117+
The template validator validates that virtual machines (VMs) do not violate their assigned templates.
118+
119+
.Reason
120+
121+
If every template validator pod is down, then the template validator fails to validate VMs against their assigned templates.
122+
123+
.Troubleshoot
124+
125+
. Check the namespaces of the ssp-operator pods and the virt-template-validator pods.
126+
+
127+
[source,terminal]
128+
----
129+
$ export NAMESPACE_SSP="$(oc get deployment -A | grep ssp-operator | awk '{print $1}')"
130+
----
131+
+
132+
[source,terminal]
133+
----
134+
$ export NAMESPACE="$(oc get deployment -A | grep virt-template-validator | awk '{print $1}')"
135+
----
136+
137+
. Verify that the virt-template-validator’s pod is currently down.
138+
+
139+
[source,terminal]
140+
----
141+
$ oc -n $NAMESPACE get pods -l name=virt-template-validator
142+
----
143+
144+
. Check the pod description and logs of the ssp-operator and the virt-template-validator.
145+
+
146+
[source,terminal]
147+
----
148+
$ oc -n $NAMESPACE_SSP describe pods -l name=ssp-operator
149+
----
150+
+
151+
[source,terminal]
152+
----
153+
$ oc -n $NAMESPACE_SSP logs --tail=-1 -l name=ssp-operator
154+
----
155+
+
156+
[source,terminal]
157+
----
158+
$ oc -n $NAMESPACE describe pods -l name=virt-template-validator
159+
----
160+
+
161+
[source,terminal]
162+
----
163+
$ oc -n $NAMESPACE logs --tail=-1 -l name=virt-template-validator
164+
----
165+
166+
.Resolution
167+
168+
Open a support issue and provide the information gathered in the troubleshooting process.

0 commit comments

Comments
 (0)