|
| 1 | +[id="cleaning-crio-storage"] |
| 2 | + |
| 3 | += Cleaning CRI-O storage |
| 4 | + |
| 5 | +You can manually clear the CRI-O ephemeral storage if you experience the following issues: |
| 6 | + |
| 7 | +* A node cannot run on any pods and this error appears: |
| 8 | +[source, terminal] |
| 9 | ++ |
| 10 | +---- |
| 11 | +Failed to create pod sandbox: rpc error: code = Unknown desc = failed to mount container XXX: error recreating the missing symlinks: error reading name of symlink for XXX: open /var/lib/containers/storage/overlay/XXX/link: no such file or directory |
| 12 | +---- |
| 13 | ++ |
| 14 | +* You cannot create a new container on a working node and the “can’t stat lower layer” error appears: |
| 15 | +[source, terminal] |
| 16 | ++ |
| 17 | +---- |
| 18 | +can't stat lower layer ... because it does not exist. Going through storage to recreate the missing symlinks. |
| 19 | +---- |
| 20 | ++ |
| 21 | +* Your node is in the `NotReady` state after a cluster upgrade or if you attempt to reboot it. |
| 22 | + |
| 23 | +* The container runtime implementation (`crio`) is not working properly. |
| 24 | + |
| 25 | +* You are unable to start a debug shell on the node using `oc debug node/<nodename>` because the container runtime instance (`crio`) is not working. |
| 26 | + |
| 27 | +Follow this process to completely wipe the CRI-O storage and resolve the errors. |
| 28 | + |
| 29 | +.Prerequisites: |
| 30 | + |
| 31 | + * You have access to the cluster as a user with the `cluster-admin` role. |
| 32 | + * You have installed the OpenShift CLI (`oc`). |
| 33 | + |
| 34 | +.Procedure |
| 35 | + |
| 36 | +. Use `cordon` on the node. This is to avoid any workload getting scheduled if the node gets into the `Ready` status. You will know that scheduling is disabled when `SchedulingDisabled` is in your Status section: |
| 37 | +[source, terminal] |
| 38 | ++ |
| 39 | +---- |
| 40 | +$ oc adm cordon <nodename> |
| 41 | +---- |
| 42 | ++ |
| 43 | +. Drain the node as the cluster-admin user: |
| 44 | +[source, terminal] |
| 45 | ++ |
| 46 | +---- |
| 47 | +$ oc adm drain <nodename> --ignore-daemonsets --delete-local-data |
| 48 | +---- |
| 49 | ++ |
| 50 | +. When the node returns, connect back to the node via SSH or Console. Then connect to the root user: |
| 51 | +[source, terminal] |
| 52 | ++ |
| 53 | +---- |
| 54 | + |
| 55 | +$ sudo -i |
| 56 | +---- |
| 57 | ++ |
| 58 | +. Manually stop the kublet: |
| 59 | +[source, terminal] |
| 60 | ++ |
| 61 | +---- |
| 62 | +# systemctl stop kubelet |
| 63 | +---- |
| 64 | ++ |
| 65 | +. Stop the containers and pods: |
| 66 | +[source, terminal] |
| 67 | ++ |
| 68 | +---- |
| 69 | +# crictl rmp -fa |
| 70 | +---- |
| 71 | ++ |
| 72 | +. Manually stop the crio services: |
| 73 | +[source, terminal] |
| 74 | ++ |
| 75 | +---- |
| 76 | +# systemctl stop crio |
| 77 | +---- |
| 78 | ++ |
| 79 | +. After you run those commands, you can completely wipe the ephemeral storage: |
| 80 | +[source, terminal] |
| 81 | ++ |
| 82 | +---- |
| 83 | +# crio wipe -f |
| 84 | +---- |
| 85 | ++ |
| 86 | +. Start the crio and kublet service: |
| 87 | +[source, terminal] |
| 88 | ++ |
| 89 | +---- |
| 90 | +# systemctl start crio |
| 91 | +# systemctl start kubelet |
| 92 | +---- |
| 93 | ++ |
| 94 | +. You will know if the clean up worked if the crio and kublet services are started, and the node is in the `Ready` status: |
| 95 | +[source, terminal] |
| 96 | ++ |
| 97 | +---- |
| 98 | +$ oc get nodes |
| 99 | +---- |
| 100 | ++ |
| 101 | +.Example output |
| 102 | +[source, terminal] |
| 103 | ++ |
| 104 | +---- |
| 105 | +NAME STATUS ROLES AGE VERSION |
| 106 | +ci-ln-tkbxyft-f76d1-nvwhr-master-1 Ready, SchedulingDisabled master 133m v1.22.0-rc.0+75ee307 |
| 107 | +---- |
| 108 | ++ |
| 109 | +. Mark the node schedulable. You will know that the scheduling is enabled when `SchedulingDisabled` is no longer in status: |
| 110 | +[source, terminal] |
| 111 | ++ |
| 112 | +---- |
| 113 | +$ oc adm uncordon <nodename> |
| 114 | +---- |
| 115 | ++ |
| 116 | +.Example output |
| 117 | +[source, terminal] |
| 118 | ++ |
| 119 | +---- |
| 120 | +NAME STATUS ROLES AGE VERSION |
| 121 | +ci-ln-tkbxyft-f76d1-nvwhr-master-1 Ready master 133m v1.22.0-rc.0+75ee307 |
| 122 | +---- |
| 123 | ++ |
0 commit comments