Skip to content

Commit 16c22e4

Browse files
authored
Merge pull request #64493 from JoeAldinger/OSDOCS-6269-trouble-shooting
/lgtm
2 parents 83c3edc + 1d1eaa1 commit 16c22e4

File tree

5 files changed

+83
-44
lines changed

5 files changed

+83
-44
lines changed

modules/nw-ovn-kubernetes-alerts-cli.adoc

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,14 +26,11 @@ $ ALERT_MANAGER=$(oc get route alertmanager-main -n openshift-monitoring \
2626
-o jsonpath='{@.spec.host}')
2727
----
2828

29-
.. Issue a `curl` request to the alert manager route API with the correct authorization details requesting specific fields by running the following command:
29+
.. Issue a `curl` request to the alert manager route API by running the following command, replacing `$ALERT_MANAGER` with the URL of your `Alertmanager` instance:
3030
+
3131
[source,terminal]
3232
----
33-
$ curl -s -k -H "Authorization: Bearer \
34-
$(oc create token prometheus-k8s -n openshift-monitoring)" \
35-
https://$ALERT_MANAGER/api/v1/alerts \
36-
| jq '.data[] | "\(.labels.severity) \(.labels.alertname) \(.labels.pod) \(.labels.container) \(.labels.endpoint) \(.labels.instance)"'
33+
$ curl -s -k -H "Authorization: Bearer $(oc create token prometheus-k8s -n openshift-monitoring)" https://$ALERT_MANAGER/api/v1/alerts | jq '.data[] | "\(.labels.severity) \(.labels.alertname) \(.labels.pod) \(.labels.container) \(.labels.endpoint) \(.labels.instance)"'
3734
----
3835

3936
. View alerting rules by running the following command:

modules/nw-ovn-kubernetes-change-log-levels.adoc

Lines changed: 66 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
[id="nw-ovn-kubernetes-change-log-levels_{context}"]
77
= Changing the OVN-Kubernetes log levels
88

9-
The default log level for OVN-Kubernetes is 2. To debug OVN-Kubernetes set the log level to 5.
9+
The default log level for OVN-Kubernetes is 4. To debug OVN-Kubernetes, set the log level to 5.
1010
Follow this procedure to increase the log level of the OVN-Kubernetes to help you debug an issue.
1111

1212
.Prerequisites
@@ -26,16 +26,16 @@ $ oc get po -o wide -n openshift-ovn-kubernetes
2626
.Example output
2727
[source,terminal]
2828
----
29-
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
30-
ovnkube-master-84nc9 6/6 Running 0 50m 10.0.134.156 ip-10-0-134-156.ec2.internal <none> <none>
31-
ovnkube-master-gmlqv 6/6 Running 0 50m 10.0.209.180 ip-10-0-209-180.ec2.internal <none> <none>
32-
ovnkube-master-nhts2 6/6 Running 1 (48m ago) 50m 10.0.147.31 ip-10-0-147-31.ec2.internal <none> <none>
33-
ovnkube-node-2cbh8 5/5 Running 0 43m 10.0.217.114 ip-10-0-217-114.ec2.internal <none> <none>
34-
ovnkube-node-6fvzl 5/5 Running 0 50m 10.0.147.31 ip-10-0-147-31.ec2.internal <none> <none>
35-
ovnkube-node-f4lzz 5/5 Running 0 24m 10.0.146.76 ip-10-0-146-76.ec2.internal <none> <none>
36-
ovnkube-node-jf67d 5/5 Running 0 50m 10.0.209.180 ip-10-0-209-180.ec2.internal <none> <none>
37-
ovnkube-node-np9mf 5/5 Running 0 40m 10.0.165.191 ip-10-0-165-191.ec2.internal <none> <none>
38-
ovnkube-node-qjldg 5/5 Running 0 50m 10.0.134.156 ip-10-0-134-156.ec2.internal <none> <none>
29+
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
30+
ovnkube-control-plane-65497d4548-9ptdr 2/2 Running 2 (128m ago) 147m 10.0.0.3 ci-ln-3njdr9b-72292-5nwkp-master-0 <none> <none>
31+
ovnkube-control-plane-65497d4548-j6zfk 2/2 Running 0 147m 10.0.0.5 ci-ln-3njdr9b-72292-5nwkp-master-2 <none> <none>
32+
ovnkube-control-plane-65497d4548-k7xqt 2/2 Running 0 147m 10.0.0.4 ci-ln-3njdr9b-72292-5nwkp-master-1 <none> <none>
33+
ovnkube-node-5dx44 8/8 Running 0 146m 10.0.0.3 ci-ln-3njdr9b-72292-5nwkp-master-0 <none> <none>
34+
ovnkube-node-dpfn4 8/8 Running 0 146m 10.0.0.4 ci-ln-3njdr9b-72292-5nwkp-master-1 <none> <none>
35+
ovnkube-node-kwc9l 8/8 Running 0 134m 10.0.128.2 ci-ln-3njdr9b-72292-5nwkp-worker-a-2fjcj <none> <none>
36+
ovnkube-node-mcrhl 8/8 Running 0 134m 10.0.128.4 ci-ln-3njdr9b-72292-5nwkp-worker-c-v9x5v <none> <none>
37+
ovnkube-node-nsct4 8/8 Running 0 146m 10.0.0.5 ci-ln-3njdr9b-72292-5nwkp-master-2 <none> <none>
38+
ovnkube-node-zrj9f 8/8 Running 0 134m 10.0.128.3 ci-ln-3njdr9b-72292-5nwkp-worker-b-v78h7 <none> <none>
3939
----
4040

4141
. Create a `ConfigMap` file similar to the following example and use a filename such as `env-overrides.yaml`:
@@ -49,12 +49,12 @@ metadata:
4949
name: env-overrides
5050
namespace: openshift-ovn-kubernetes
5151
data:
52-
ip-10-0-217-114.ec2.internal: | <1>
52+
ci-ln-3njdr9b-72292-5nwkp-master-0: | <1>
5353
# This sets the log level for the ovn-kubernetes node process:
5454
OVN_KUBE_LOG_LEVEL=5
5555
# You might also/instead want to enable debug logging for ovn-controller:
5656
OVN_LOG_LEVEL=dbg
57-
ip-10-0-209-180.ec2.internal: |
57+
ci-ln-3njdr9b-72292-5nwkp-master-2: |
5858
# This sets the log level for the ovn-kubernetes node process:
5959
OVN_KUBE_LOG_LEVEL=5
6060
# You might also/instead want to enable debug logging for ovn-controller:
@@ -86,16 +86,66 @@ configmap/env-overrides.yaml created
8686
[source,terminal]
8787
----
8888
$ oc delete pod -n openshift-ovn-kubernetes \
89-
--field-selector spec.nodeName=ip-10-0-217-114.ec2.internal -l app=ovnkube-node
89+
--field-selector spec.nodeName=ci-ln-3njdr9b-72292-5nwkp-master-0 -l app=ovnkube-node
9090
----
9191
+
9292
[source,terminal]
9393
----
9494
$ oc delete pod -n openshift-ovn-kubernetes \
95-
--field-selector spec.nodeName=ip-10-0-209-180.ec2.internal -l app=ovnkube-node
95+
--field-selector spec.nodeName=ci-ln-3njdr9b-72292-5nwkp-master-2 -l app=ovnkube-node
9696
----
9797
+
9898
[source,terminal]
9999
----
100-
$ oc delete pod -n openshift-ovn-kubernetes -l app=ovnkube-master
100+
$ oc delete pod -n openshift-ovn-kubernetes -l app=ovnkube-node
101+
----
102+
103+
. To verify that the `ConfigMap`file has been applied to all nodes for a specific pod, run the following command:
104+
+
105+
[source,terminal]
106+
----
107+
$ oc logs -n openshift-ovn-kubernetes --all-containers --prefix ovnkube-node-<xxxx> | grep -E -m 10 '(Logging config:|vconsole|DBG)'
108+
----
109+
+
110+
where:
111+
112+
`<XXXX>`:: Specifies the random sequence of letters for a pod from the previous step.
113+
+
114+
.Example output
115+
[source,terminal]
116+
----
117+
[pod/ovnkube-node-2cpjc/sbdb] + exec /usr/share/ovn/scripts/ovn-ctl --no-monitor '--ovn-sb-log=-vconsole:info -vfile:off -vPATTERN:console:%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m' run_sb_ovsdb
118+
[pod/ovnkube-node-2cpjc/ovnkube-controller] I1012 14:39:59.984506 35767 config.go:2247] Logging config: {File: CNIFile:/var/log/ovn-kubernetes/ovn-k8s-cni-overlay.log LibovsdbFile:/var/log/ovnkube/libovsdb.log Level:5 LogFileMaxSize:100 LogFileMaxBackups:5 LogFileMaxAge:0 ACLLoggingRateLimit:20}
119+
[pod/ovnkube-node-2cpjc/northd] + exec ovn-northd --no-chdir -vconsole:info -vfile:off '-vPATTERN:console:%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m' --pidfile /var/run/ovn/ovn-northd.pid --n-threads=1
120+
[pod/ovnkube-node-2cpjc/nbdb] + exec /usr/share/ovn/scripts/ovn-ctl --no-monitor '--ovn-nb-log=-vconsole:info -vfile:off -vPATTERN:console:%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m' run_nb_ovsdb
121+
[pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.552Z|00002|hmap|DBG|lib/shash.c:114: 1 bucket with 6+ nodes, including 1 bucket with 6 nodes (32 nodes total across 32 buckets)
122+
[pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00003|hmap|DBG|lib/shash.c:114: 1 bucket with 6+ nodes, including 1 bucket with 6 nodes (64 nodes total across 64 buckets)
123+
[pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00004|hmap|DBG|lib/shash.c:114: 1 bucket with 6+ nodes, including 1 bucket with 7 nodes (32 nodes total across 32 buckets)
124+
[pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00005|reconnect|DBG|unix:/var/run/openvswitch/db.sock: entering BACKOFF
125+
[pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00007|reconnect|DBG|unix:/var/run/openvswitch/db.sock: entering CONNECTING
126+
[pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00008|ovsdb_cs|DBG|unix:/var/run/openvswitch/db.sock: SERVER_SCHEMA_REQUESTED -> SERVER_SCHEMA_REQUESTED at lib/ovsdb-cs.c:423
127+
----
128+
129+
. Optional: Check the `ConfigMap` file has been applied by running the following command:
130+
+
131+
[source,terminal]
132+
----
133+
for f in $(oc -n openshift-ovn-kubernetes get po -l 'app=ovnkube-node' --no-headers -o custom-columns=N:.metadata.name) ; do echo "---- $f ----" ; oc -n openshift-ovn-kubernetes exec -c ovnkube-controller $f -- pgrep -a -f init-ovnkube-controller | grep -P -o '^.*loglevel\s+\d' ; done
134+
----
135+
+
136+
.Example output
137+
[source,terminal]
138+
----
139+
---- ovnkube-node-2dt57 ----
140+
60981 /usr/bin/ovnkube --init-ovnkube-controller xpst8-worker-c-vmh5n.c.openshift-qe.internal --init-node xpst8-worker-c-vmh5n.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4
141+
---- ovnkube-node-4zznh ----
142+
178034 /usr/bin/ovnkube --init-ovnkube-controller xpst8-master-2.c.openshift-qe.internal --init-node xpst8-master-2.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4
143+
---- ovnkube-node-548sx ----
144+
77499 /usr/bin/ovnkube --init-ovnkube-controller xpst8-worker-a-fjtnb.c.openshift-qe.internal --init-node xpst8-worker-a-fjtnb.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4
145+
---- ovnkube-node-6btrf ----
146+
73781 /usr/bin/ovnkube --init-ovnkube-controller xpst8-worker-b-p8rww.c.openshift-qe.internal --init-node xpst8-worker-b-p8rww.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4
147+
---- ovnkube-node-fkc9r ----
148+
130707 /usr/bin/ovnkube --init-ovnkube-controller xpst8-master-0.c.openshift-qe.internal --init-node xpst8-master-0.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 5
149+
---- ovnkube-node-tk9l4 ----
150+
181328 /usr/bin/ovnkube --init-ovnkube-controller xpst8-master-1.c.openshift-qe.internal --init-node xpst8-master-1.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4
101151
----

modules/nw-ovn-kubernetes-logs-cli.adoc

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -36,32 +36,32 @@ For example:
3636
+
3737
[source,terminal]
3838
----
39-
$ oc logs ovnkube-master-7h4q7 -n openshift-ovn-kubernetes
39+
$ oc logs ovnkube-node-5dx44 -n openshift-ovn-kubernetes
4040
----
4141
+
4242
[source,terminal]
4343
----
44-
$ oc logs -f ovnkube-master-7h4q7 -n openshift-ovn-kubernetes -c ovn-dbchecker
44+
$ oc logs -f ovnkube-node-5dx44 -c ovnkube-controller -n openshift-ovn-kubernetes
4545
----
4646
+
4747
The contents of log files are printed out.
4848

49-
. Examine the most recent entries in all the containers in the `ovnkube-master` pods:
49+
. Examine the most recent entries in all the containers in the `ovnkube-node` pods:
5050
+
5151
[source,terminal]
5252
----
53-
$ for p in $(oc get pods --selector app=ovnkube-master -n openshift-ovn-kubernetes \
53+
$ for p in $(oc get pods --selector app=ovnkube-node -n openshift-ovn-kubernetes \
5454
-o jsonpath='{range.items[*]}{" "}{.metadata.name}'); \
5555
do echo === $p ===; for container in $(oc get pods -n openshift-ovn-kubernetes $p \
5656
-o json | jq -r '.status.containerStatuses[] | .name');do echo ---$container---; \
5757
oc logs -c $container $p -n openshift-ovn-kubernetes --tail=5; done; done
5858
----
5959

60-
. View the last 5 lines of every log in every container in an `ovnkube-master` pod using the following command:
60+
. View the last 5 lines of every log in every container in an `ovnkube-node` pod using the following command:
6161
+
6262
[source,terminal]
6363
----
64-
$ oc logs -l app=ovnkube-master -n openshift-ovn-kubernetes --all-containers --tail 5
64+
$ oc logs -l app=ovnkube-node -n openshift-ovn-kubernetes --all-containers --tail 5
6565
----
6666

6767

modules/nw-ovn-kubernetes-readiness-probes.adoc

Lines changed: 9 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
[id="nw-ovn-kubernetes-readiness-probes_{context}"]
77
= Monitoring OVN-Kubernetes health by using readiness probes
88

9-
The `ovnkube-master` and `ovnkube-node` pods have containers configured with readiness probes.
9+
The `ovnkube-control-plane` and `ovnkube-node` pods have containers configured with readiness probes.
1010

1111
.Prerequisites
1212

@@ -16,25 +16,18 @@ The `ovnkube-master` and `ovnkube-node` pods have containers configured with rea
1616
1717
.Procedure
1818

19-
. Review the details of the `ovnkube-master` readiness probe by running the following command:
19+
. Review the details of the `ovnkube-node` readiness probe by running the following command:
2020
+
2121
[source,terminal]
2222
----
23-
$ oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-master \
23+
$ oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node \
2424
-o json | jq '.items[0].spec.containers[] | .name,.readinessProbe'
2525
----
2626
+
27-
The readiness probe for the northbound and southbound database containers in the `ovnkube-master` pod checks for the health of the Raft cluster hosting the databases.
27+
The readiness probe for the northbound and southbound database containers in the `ovnkube-node` pod checks for the health of the databases and the `ovnkube-controller` container.
2828

29-
. Review the details of the `ovnkube-node` readiness probe by running the following command:
30-
+
31-
[source,terminal]
32-
----
33-
$ oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-master \
34-
-o json | jq '.items[0].spec.containers[] | .name,.readinessProbe'
35-
----
3629
+
37-
The `ovnkube-node` container in the `ovnkube-node` pod has a readiness probe to verify the presence of the ovn-kubernetes CNI configuration file, the absence of which would indicate that the pod is not running or is not ready to accept requests to configure pods.
30+
The `ovnkube-node` container in the `ovnkube-node` pod has a readiness probe to verify the presence of the OVN-Kubernetes CNI configuration file, the absence of which would indicate that the pod is not running or is not ready to accept requests to configure pods.
3831

3932
. Show all events including the probe failures, for the namespace by using the following command:
4033
+
@@ -43,11 +36,11 @@ The `ovnkube-node` container in the `ovnkube-node` pod has a readiness probe to
4336
$ oc get events -n openshift-ovn-kubernetes
4437
----
4538

46-
. Show the events for just this pod:
39+
. Show the events for just a specific pod:
4740
+
4841
[source,terminal]
4942
----
50-
$ oc describe pod ovnkube-master-tp2z8 -n openshift-ovn-kubernetes
43+
$ oc describe pod ovnkube-node-9lqfk -n openshift-ovn-kubernetes
5144
----
5245

5346
. Show the messages and statuses from the cluster network operator:
@@ -57,11 +50,11 @@ $ oc describe pod ovnkube-master-tp2z8 -n openshift-ovn-kubernetes
5750
$ oc get co/network -o json | jq '.status.conditions[]'
5851
----
5952

60-
. Show the `ready` status of each container in `ovnkube-master` pods by running the following script:
53+
. Show the `ready` status of each container in `ovnkube-node` pods by running the following script:
6154
+
6255
[source,terminal]
6356
----
64-
$ for p in $(oc get pods --selector app=ovnkube-master -n openshift-ovn-kubernetes \
57+
$ for p in $(oc get pods --selector app=ovnkube-node -n openshift-ovn-kubernetes \
6558
-o jsonpath='{range.items[*]}{" "}{.metadata.name}'); do echo === $p ===; \
6659
oc get pods -n openshift-ovn-kubernetes $p -o json | jq '.status.containerStatuses[] | .name, .ready'; \
6760
done

networking/ovn_kubernetes_network_provider/ovn-kubernetes-troubleshooting-sources.adoc

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,6 @@ include::modules/nw-ovn-kubernetes-pod-connectivity-checks.adoc[leveloffset=+1]
3232
[id="additional-resources_ovn-kubernetes-sources-of-troubleshooting-information"]
3333
== Additional resources
3434

35-
* link:https://access.redhat.com/solutions/5892971[How do I change the ovn-kubernetes loglevel in OpenShift 4?]
3635
* xref:../../networking/verifying-connectivity-endpoint.adoc#nw-pod-network-connectivity-implementation_verifying-connectivity-endpoint[Implementation of connection health checks]
3736
* xref:../../networking/verifying-connectivity-endpoint.adoc#nw-pod-network-connectivity-verify_verifying-connectivity-endpoint[Verifying network connectivity for an endpoint]
3837

0 commit comments

Comments
 (0)