Skip to content

Commit 9ec6d4f

Browse files
committed
TELCODOCS#1876: Moved troubleshooting assembly modules to the main LVMS assembly
1 parent 66d2074 commit 9ec6d4f

9 files changed

+157
-124
lines changed

_topic_maps/_topic_map.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1662,8 +1662,6 @@ Topics:
16621662
File: persistent-storage-hostpath
16631663
- Name: Persistent storage using LVM Storage
16641664
File: persistent-storage-using-lvms
1665-
- Name: Troubleshooting local persistent storage using LVMS
1666-
File: troubleshooting-local-persistent-storage-using-lvms
16671665
- Name: Using Container Storage Interface (CSI)
16681666
Dir: container_storage_interface
16691667
Distros: openshift-enterprise,openshift-origin

modules/lvms-troubleshooting-investigating-a-pvc-stuck-in-the-pending-state.adoc

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,23 @@
1-
// This module is included in the following assemblies:
1+
// Module included in the following assemblies:
22
//
3-
// storage/persistent_storage/persistent_storage_local/troubleshooting-local-persistent-storage-using-lvms.adoc
3+
// storage/persistent_storage/persistent_storage_local/persistent-storage-using-lvms.adoc
44

55
:_mod-docs-content-type: PROCEDURE
66
[id="investigating-a-pvc-stuck-in-the-pending-state_{context}"]
77
= Investigating a PVC stuck in the Pending state
88

9-
A persistent volume claim (PVC) can get stuck in a `Pending` state for a number of reasons. For example:
9+
A persistent volume claim (PVC) can get stuck in the `Pending` state for the following reasons:
1010

11-
- Insufficient computing resources
12-
- Network problems
13-
- Mismatched storage class or node selector
14-
- No available volumes
15-
- The node with the persistent volume (PV) is in a `Not Ready` state
11+
- Insufficient computing resources.
12+
- Network problems.
13+
- Mismatched storage class or node selector.
14+
- No available persistent volumes (PVs).
15+
- The node with the PV is in the `Not Ready` state.
1616
17-
Identify the cause by using the `oc describe` command to review details about the stuck PVC.
17+
.Prerequisites
18+
19+
* You have installed the {oc-first}.
20+
* You have logged in to the {oc-first} as a user with `cluster-admin` permissions.
1821
1922
.Procedure
2023

Lines changed: 31 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,22 @@
1-
// This module is included in the following assemblies:
1+
// Module included in the following assemblies:
22
//
3-
// storage/persistent_storage/persistent_storage_local/troubleshooting-local-persistent-storage-using-lvms.adoc
3+
// storage/persistent_storage/persistent_storage_local/persistent-storage-using-lvms.adoc
44

55
:_mod-docs-content-type: PROCEDURE
66
[id="performing-a-forced-cleanup_{context}"]
7-
= Performing a forced cleanup
7+
= Performing a forced clean-up
88

9-
If disk- or node-related problems persist after you complete the troubleshooting procedures, it might be necessary to perform a forced cleanup procedure. A forced cleanup is used to comprehensively address persistent issues and ensure the proper functioning of the LVMS.
9+
If the disk or node-related problems persist even after you have completed the troubleshooting procedures, you must perform a forced clean-up. A forced clean-up is used to address persistent issues and ensure the proper functioning of {lvms-first}.
1010

1111
.Prerequisites
1212

13-
. All of the persistent volume claims (PVCs) created using the logical volume manager storage (LVMS) driver have been removed.
13+
* You have installed the {oc-first}.
1414
15-
. The pods using those PVCs have been stopped.
15+
* You have logged in to the {oc-first} as a user with `cluster-admin` permissions.
16+
17+
* You have deleted all the persistent volume claims (PVCs) that were created by using {lvms}.
18+
19+
* You have stopped the pods that are using the PVCs that were created by using {lvms}.
1620
1721
1822
.Procedure
@@ -24,74 +28,70 @@ If disk- or node-related problems persist after you complete the troubleshooting
2428
$ oc project openshift-storage
2529
----
2630

27-
. Ensure there is no `Logical Volume` custom resource (CR) remaining by running the following command:
31+
. Check if the `LogicalVolume` custom resources (CRs) are present by running the following command:
2832
+
2933
[source,terminal]
3034
----
3135
$ oc get logicalvolume
3236
----
33-
+
34-
.Example output
35-
[source,terminal]
36-
----
37-
No resources found
38-
----
3937

40-
.. If there are any `LogicalVolume` CRs remaining, remove their finalizers by running the following command:
38+
.. If the `LogicalVolume` CRs are present, delete them by running the following command:
4139
+
4240
[source,terminal]
4341
----
44-
$ oc patch logicalvolume <name> -p '{"metadata":{"finalizers":[]}}' --type=merge <1>
42+
$ oc delete logicalvolume <name> <1>
4543
----
46-
<1> Replace `<name>` with the name of the CR.
44+
<1> Replace `<name>` with the name of the `LogicalVolume` CR.
4745

48-
.. After removing their finalizers, delete the CRs by running the following command:
46+
.. After deleting the `LogicalVolume` CRs, remove their finalizers by running the following command:
4947
+
5048
[source,terminal]
5149
----
52-
$ oc delete logicalvolume <name> <1>
50+
$ oc patch logicalvolume <name> -p '{"metadata":{"finalizers":[]}}' --type=merge <1>
5351
----
54-
<1> Replace `<name>` with the name of the CR.
52+
<1> Replace `<name>` with the name of the `LogicalVolume` CR.
5553

56-
. Make sure there are no `LVMVolumeGroup` CRs left by running the following command:
54+
. Check if the `LVMVolumeGroup` CRs are present by running the following command:
5755
+
5856
[source,terminal]
5957
----
6058
$ oc get lvmvolumegroup
6159
----
60+
61+
.. If the `LVMVolumeGroup` CRs are present, delete them by running the following command:
6262
+
63-
.Example output
6463
[source,terminal]
6564
----
66-
No resources found
65+
$ oc delete lvmvolumegroup <name> <1>
6766
----
67+
<1> Replace `<name>` with the name of the `LVMVolumeGroup` CR.
6868

69-
.. If there are any `LVMVolumeGroup` CRs left, remove their finalizers by running the following command:
69+
.. After deleting the `LVMVolumeGroup` CRs, remove their finalizers by running the following command:
7070
+
7171
[source,terminal]
7272
----
7373
$ oc patch lvmvolumegroup <name> -p '{"metadata":{"finalizers":[]}}' --type=merge <1>
7474
----
75-
<1> Replace `<name>` with the name of the CR.
75+
<1> Replace `<name>` with the name of the `LVMVolumeGroup` CR.
7676

77-
.. After removing their finalizers, delete the CRs by running the following command:
77+
. Delete any `LVMVolumeGroupNodeStatus` CRs by running the following command:
7878
+
7979
[source,terminal]
8080
----
81-
$ oc delete lvmvolumegroup <name> <1>
81+
$ oc delete lvmvolumegroupnodestatus --all
8282
----
83-
<1> Replace `<name>` with the name of the CR.
8483

85-
. Remove any `LVMVolumeGroupNodeStatus` CRs by running the following command:
84+
. Delete the `LVMCluster` CR by running the following command:
8685
+
8786
[source,terminal]
8887
----
89-
$ oc delete lvmvolumegroupnodestatus --all
88+
$ oc delete lvmcluster --all
9089
----
9190

92-
. Remove the `LVMCluster` CR by running the following command:
91+
.. After deleting the `LVMCluster` CR, remove its finalizer by running the following command:
9392
+
9493
[source,terminal]
9594
----
96-
$ oc delete lvmcluster --all
95+
$ oc patch lvmcluster <name> -p '{"metadata":{"finalizers":[]}}' --type=merge <1>
9796
----
97+
<1> Replace `<name>` with the name of the `LVMCluster` CR.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
// Module included in the following assemblies:
2+
//
3+
// storage/persistent_storage/persistent_storage_local/persistent-storage-using-lvms.adoc
4+
5+
:_mod-docs-content-type: CONCEPT
6+
[id="lvms-troubleshooting-persistent-storage_{context}"]
7+
= Troubleshooting persistent storage
8+
9+
While configuring persistent storage using {lvms-first}, you can encounter several issues that require troubleshooting.
Lines changed: 39 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,44 @@
1-
// This module is included in the following assemblies:
1+
// Module included in the following assemblies:
22
//
3-
// storage/persistent_storage/persistent_storage_local/troubleshooting-local-persistent-storage-using-lvms.adoc
3+
// storage/persistent_storage/persistent_storage_local/persistent-storage-using-lvms.adoc
44

55
:_mod-docs-content-type: PROCEDURE
66
[id="recovering-from-disk-failure_{context}"]
77
= Recovering from disk failure
88

9-
If you see a failure message while inspecting the events associated with the persistent volume claim (PVC), there might be a problem with the underlying volume or disk. Disk and volume provisioning issues often result with a generic error first, such as `Failed to provision volume with StorageClass <storage_class_name>`. A second, more specific error message usually follows.
9+
If you see a failure message while inspecting the events associated with the persistent volume claim (PVC), there can be a problem with the underlying volume or disk.
10+
11+
Disk and volume provisioning issues result with a generic error message such as `Failed to provision volume with storage class <storage_class_name>`. The generic error message is followed by a specific volume failure error message.
12+
13+
The following table describes the volume failure error messages:
14+
15+
.Volume failure error messages
16+
[%autowidth, options="header"]
17+
|===
18+
19+
|Error message |Description
20+
21+
|`Failed to check volume existence`
22+
|Indicates a problem in verifying whether the volume already exists. Volume verification failure can be caused by network connectivity problems or other failures.
23+
24+
|`Failed to bind volume`
25+
|Failure to bind a volume can happen if the persistent volume (PV) that is available does not match the requirements of the PVC.
26+
27+
|`FailedMount` or `FailedAttachVolume`
28+
|This error indicates problems when trying to mount the volume to a node. If the disk has failed, this error can appear when a pod tries to use the PVC.
29+
30+
|`FailedUnMount`
31+
|This error indicates problems when trying to unmount a volume from a node. If the disk has failed, this error can appear when a pod tries to use the PVC.
32+
33+
|`Volume is already exclusively attached to one node and cannot be attached to another`
34+
|This error can appear with storage solutions that do not support `ReadWriteMany` access modes.
35+
36+
|===
37+
38+
.Prerequisites
39+
40+
* You have installed the {oc-first}.
41+
* You have logged in to the {oc-first} as a user with `cluster-admin` permissions.
1042
1143
.Procedure
1244

@@ -16,18 +48,12 @@ If you see a failure message while inspecting the events associated with the per
1648
----
1749
$ oc describe pvc <pvc_name> <1>
1850
----
19-
<1> Replace `<pvc_name>` with the name of the PVC. Here are some examples of disk or volume failure error messages and their causes:
20-
+
21-
- *Failed to check volume existence:* Indicates a problem in verifying whether the volume already exists. Volume verification failure can be caused by network connectivity problems or other failures.
22-
+
23-
- *Failed to bind volume:* Failure to bind a volume can happen if the persistent volume (PV) that is available does not match the requirements of the PVC.
24-
+
25-
- *FailedMount or FailedUnMount:* This error indicates problems when trying to mount the volume to a node or unmount a volume from a node. If the disk has failed, this error might appear when a pod tries to use the PVC.
26-
+
27-
- *Volume is already exclusively attached to one node and cannot be attached to another:* This error can appear with storage solutions that do not support `ReadWriteMany` access modes.
51+
<1> Replace `<pvc_name>` with the name of the PVC.
2852

2953
. Establish a direct connection to the host where the problem is occurring.
3054

3155
. Resolve the disk issue.
3256

33-
After you have resolved the issue with the disk, you might need to perform the forced cleanup procedure if failure messages persist or reoccur.
57+
.Next steps
58+
59+
* If the volume failure messages persist or recur even after you have resolved the issue with the disk, you must perform a forced clean-up. For more information, see "Performing a forced clean-up".

modules/lvms-troubleshooting-recovering-from-missing-lvms-or-operator-components.adoc

Lines changed: 19 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,21 @@
1-
// This module is included in the following assemblies:
1+
// Module included in the following assemblies:
22
//
3-
// storage/persistent_storage/persistent_storage_local/troubleshooting-local-persistent-storage-using-lvms.adoc
3+
// storage/persistent_storage/persistent_storage_local/persistent-storage-using-lvms.adoc
44

55
:_mod-docs-content-type: PROCEDURE
66
[id="recovering-from-missing-lvms-or-operator-components_{context}"]
7-
= Recovering from missing LVMS or Operator components
7+
= Recovering from a missing storage class
88

9-
If you encounter a storage class "not found" error, check the `LVMCluster` resource and ensure that all the logical volume manager storage (LVMS) pods are running. You can create an `LVMCluster` resource if it does not exist.
9+
If you encounter the `storage class not found` error, check the `LVMCluster` custom resource (CR) and ensure that all the {lvms-first} pods are in the `Running` state.
10+
11+
.Prerequisites
12+
13+
* You have installed the {oc-first}.
14+
* You have logged in to the {oc-first} as a user with `cluster-admin` permissions.
1015
1116
.Procedure
1217

13-
. Verify the presence of the LVMCluster resource by running the following command:
18+
. Verify that the `LVMCluster` CR is present by running the following command:
1419
+
1520
[source,terminal]
1621
----
@@ -24,33 +29,9 @@ NAME AGE
2429
my-lvmcluster 65m
2530
----
2631

27-
. If the cluster does not have an `LVMCluster` resource, create one by running the following command:
28-
+
29-
[source,terminal]
30-
----
31-
$ oc create -n openshift-storage -f <custom_resource> <1>
32-
----
33-
<1> Replace `<custom_resource>` with a custom resource URL or file tailored to your requirements.
34-
+
35-
.Example custom resource
36-
[source,yaml,options="nowrap",role="white-space-pre"]
37-
----
38-
apiVersion: lvm.topolvm.io/v1alpha1
39-
kind: LVMCluster
40-
metadata:
41-
name: my-lvmcluster
42-
spec:
43-
storage:
44-
deviceClasses:
45-
- name: vg1
46-
default: true
47-
thinPoolConfig:
48-
name: thin-pool-1
49-
sizePercent: 90
50-
overprovisionRatio: 10
51-
----
32+
. If the `LVMCluster` CR is not present, create an `LVMCluster` CR. For more information, see "Ways to create an LVMCluster custom resource".
5233

53-
. Check that all the pods from LVMS are in the `Running` state in the `openshift-storage` namespace by running the following command:
34+
. In the `openshift-storage` namespace, check that all the {lvms} pods are in the `Running` state by running the following command:
5435
+
5536
[source,terminal]
5637
----
@@ -67,9 +48,14 @@ topolvm-node-dr26h 4/4 Running 0 66m
6748
vg-manager-r6zdv 1/1 Running 0 66m
6849
----
6950
+
70-
The expected output is one running instance of `lvms-operator` and `vg-manager`. One instance of `topolvm-controller` and `topolvm-node` is expected for each node.
51+
The output of this command must contain a running instance of the following pods:
52+
53+
* `lvms-operator`
54+
* `vg-manager`
55+
* `topolvm-controller`
56+
* `topolvm-node`
7157
+
72-
If `topolvm-node` is stuck in the `Init` state, there is a failure to locate an available disk for LVMS to use. To retrieve the information necessary to troubleshoot, review the logs of the `vg-manager` pod by running the following command:
58+
If the `topolvm-node` pod is stuck in the `Init` state, it is due to a failure to locate an available disk for {lvms} to use. To retrieve the necessary information to troubleshoot this issue, review the logs of the `vg-manager` pod by running the following command:
7359
+
7460
[source,terminal]
7561
----

modules/lvms-troubleshooting-recovering-from-node-failure.adoc

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,19 @@
1-
// This module is included in the following assemblies:
1+
// Module included in the following assemblies:
22
//
3-
// storage/persistent_storage/persistent_storage_local/troubleshooting-local-persistent-storage-using-lvms.adoc
3+
// storage/persistent_storage/persistent_storage_local/persistent-storage-using-lvms.adoc
44

55
:_mod-docs-content-type: PROCEDURE
66
[id="recovering-from-node-failure_{context}"]
77
= Recovering from node failure
88

9-
Sometimes a persistent volume claim (PVC) is stuck in a `Pending` state because a particular node in the cluster has failed. To identify the failed node, you can examine the restart count of the `topolvm-node` pod. An increased restart count indicates potential problems with the underlying node, which may require further investigation and troubleshooting.
9+
A persistent volume claim (PVC) can be stuck in the `Pending` state due to a node failure in the cluster.
10+
11+
To identify the failed node, you can examine the restart count of the `topolvm-node` pod. An increased restart count indicates potential problems with the underlying node, which might require further investigation and troubleshooting.
12+
13+
.Prerequisites
14+
15+
* You have installed the {oc-first}.
16+
* You have logged in to the {oc-first} as a user with `cluster-admin` permissions.
1017
1118
.Procedure
1219

@@ -30,5 +37,7 @@ vg-manager-r6zdv 1/1 Running 0 66m
3037
vg-manager-990ut 1/1 Running 0 66m
3138
vg-manager-an118 1/1 Running 0 66m
3239
----
33-
+
34-
After you resolve any issues with the node, you might need to perform the forced cleanup procedure if the PVC is still stuck in a `Pending` state.
40+
41+
.Next steps
42+
43+
* If the PVC is stuck in the `Pending` state even after you have resolved any issues with the node, you must perform a forced clean-up. For more information, see "Performing a forced clean-up".

0 commit comments

Comments
 (0)