Skip to content

Commit 1fe3614

Browse files
authored
Merge pull request #36507 from kelbrown20/add-cleaning-crio-storage-1994596
BZ:1994596 - Adding Clearing CRI-O storage section
2 parents d80d449 + 8d21ea4 commit 1fe3614

File tree

2 files changed

+126
-0
lines changed

2 files changed

+126
-0
lines changed

modules/cleaning-crio-storage.adoc

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
[id="cleaning-crio-storage"]
2+
3+
= Cleaning CRI-O storage
4+
5+
You can manually clear the CRI-O ephemeral storage if you experience the following issues:
6+
7+
* A node cannot run on any pods and this error appears:
8+
[source, terminal]
9+
+
10+
----
11+
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to mount container XXX: error recreating the missing symlinks: error reading name of symlink for XXX: open /var/lib/containers/storage/overlay/XXX/link: no such file or directory
12+
----
13+
+
14+
* You cannot create a new container on a working node and the “can’t stat lower layer” error appears:
15+
[source, terminal]
16+
+
17+
----
18+
can't stat lower layer ... because it does not exist. Going through storage to recreate the missing symlinks.
19+
----
20+
+
21+
* Your node is in the `NotReady` state after a cluster upgrade or if you attempt to reboot it.
22+
23+
* The container runtime implementation (`crio`) is not working properly.
24+
25+
* You are unable to start a debug shell on the node using `oc debug node/<nodename>` because the container runtime instance (`crio`) is not working.
26+
27+
Follow this process to completely wipe the CRI-O storage and resolve the errors.
28+
29+
.Prerequisites:
30+
31+
* You have access to the cluster as a user with the `cluster-admin` role.
32+
* You have installed the OpenShift CLI (`oc`).
33+
34+
.Procedure
35+
36+
. Use `cordon` on the node. This is to avoid any workload getting scheduled if the node gets into the `Ready` status. You will know that scheduling is disabled when `SchedulingDisabled` is in your Status section:
37+
[source, terminal]
38+
+
39+
----
40+
$ oc adm cordon <nodename>
41+
----
42+
+
43+
. Drain the node as the cluster-admin user:
44+
[source, terminal]
45+
+
46+
----
47+
$ oc adm drain <nodename> --ignore-daemonsets --delete-local-data
48+
----
49+
+
50+
. When the node returns, connect back to the node via SSH or Console. Then connect to the root user:
51+
[source, terminal]
52+
+
53+
----
54+
55+
$ sudo -i
56+
----
57+
+
58+
. Manually stop the kublet:
59+
[source, terminal]
60+
+
61+
----
62+
# systemctl stop kubelet
63+
----
64+
+
65+
. Stop the containers and pods:
66+
[source, terminal]
67+
+
68+
----
69+
# crictl rmp -fa
70+
----
71+
+
72+
. Manually stop the crio services:
73+
[source, terminal]
74+
+
75+
----
76+
# systemctl stop crio
77+
----
78+
+
79+
. After you run those commands, you can completely wipe the ephemeral storage:
80+
[source, terminal]
81+
+
82+
----
83+
# crio wipe -f
84+
----
85+
+
86+
. Start the crio and kublet service:
87+
[source, terminal]
88+
+
89+
----
90+
# systemctl start crio
91+
# systemctl start kubelet
92+
----
93+
+
94+
. You will know if the clean up worked if the crio and kublet services are started, and the node is in the `Ready` status:
95+
[source, terminal]
96+
+
97+
----
98+
$ oc get nodes
99+
----
100+
+
101+
.Example output
102+
[source, terminal]
103+
+
104+
----
105+
NAME STATUS ROLES AGE VERSION
106+
ci-ln-tkbxyft-f76d1-nvwhr-master-1 Ready, SchedulingDisabled master 133m v1.22.0-rc.0+75ee307
107+
----
108+
+
109+
. Mark the node schedulable. You will know that the scheduling is enabled when `SchedulingDisabled` is no longer in status:
110+
[source, terminal]
111+
+
112+
----
113+
$ oc adm uncordon <nodename>
114+
----
115+
+
116+
.Example output
117+
[source, terminal]
118+
+
119+
----
120+
NAME STATUS ROLES AGE VERSION
121+
ci-ln-tkbxyft-f76d1-nvwhr-master-1 Ready master 133m v1.22.0-rc.0+75ee307
122+
----
123+
+

support/troubleshooting/troubleshooting-crio-issues.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,3 +13,6 @@ include::modules/verifying-crio-status.adoc[leveloffset=+1]
1313

1414
// Gathering CRI-O journald unit logs
1515
include::modules/gathering-crio-logs.adoc[leveloffset=+1]
16+
17+
// Cleaning CRI-O storage
18+
include::modules/cleaning-crio-storage.adoc[leveloffset=+1]

0 commit comments

Comments
 (0)