Skip to content

Commit 6e6cafb

Browse files
authored
Merge pull request #40026 from apinnick/oadp75-troubleshooting
OADP-75: Troubleshooting
2 parents ab01081 + dcd34f1 commit 6e6cafb

File tree

12 files changed

+463
-71
lines changed

12 files changed

+463
-71
lines changed

_topic_maps/_topic_map.yml

Lines changed: 31 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -2095,35 +2095,37 @@ Topics:
20952095
File: graceful-cluster-shutdown
20962096
- Name: Restarting a cluster gracefully
20972097
File: graceful-cluster-restart
2098-
# - Name: Application backup and restore
2099-
# Dir: application_backup_and_restore
2100-
# Topics:
2101-
# - Name: OADP features and plugins
2102-
# File: oadp-features-plugins
2103-
# - Name: Installing and configuring OADP
2104-
# Dir: installing
2105-
# Topics:
2106-
# - Name: About installing OADP
2107-
# File: about-installing-oadp
2108-
# - Name: Installing and configuring OADP with AWS
2109-
# File: installing-oadp-aws
2110-
# - Name: Installing and configuring OADP with Azure
2111-
# File: installing-oadp-azure
2112-
# - Name: Installing and configuring OADP with GCP
2113-
# File: installing-oadp-gcp
2114-
# - Name: Installing and configuring OADP with MCG
2115-
# File: installing-oadp-mcg
2116-
# - Name: Installing and configuring OADP with OCS
2117-
# File: installing-oadp-ocs
2118-
# - Name: Uninstalling OADP
2119-
# File: uninstalling-oadp
2120-
# - Name: Backing up and restoring
2121-
# Dir: backing_up_and_restoring
2122-
# Topics:
2123-
# - Name: Backing up applications
2124-
# File: backing-up-applications
2125-
# - Name: Restoring applications
2126-
# File: restoring-applications
2098+
- Name: Application backup and restore
2099+
Dir: application_backup_and_restore
2100+
Topics:
2101+
- Name: OADP features and plugins
2102+
File: oadp-features-plugins
2103+
- Name: Installing and configuring OADP
2104+
Dir: installing
2105+
Topics:
2106+
- Name: About installing OADP
2107+
File: about-installing-oadp
2108+
- Name: Installing and configuring OADP with AWS
2109+
File: installing-oadp-aws
2110+
- Name: Installing and configuring OADP with Azure
2111+
File: installing-oadp-azure
2112+
- Name: Installing and configuring OADP with GCP
2113+
File: installing-oadp-gcp
2114+
- Name: Installing and configuring OADP with MCG
2115+
File: installing-oadp-mcg
2116+
- Name: Installing and configuring OADP with OCS
2117+
File: installing-oadp-ocs
2118+
- Name: Uninstalling OADP
2119+
File: uninstalling-oadp
2120+
- Name: Backing up and restoring
2121+
Dir: backing_up_and_restoring
2122+
Topics:
2123+
- Name: Backing up applications
2124+
File: backing-up-applications
2125+
- Name: Restoring applications
2126+
File: restoring-applications
2127+
- Name: Troubleshooting
2128+
File: troubleshooting
21272129
- Name: Control plane backup and restore
21282130
Dir: control_plane_backup_and_restore
21292131
Topics:

backup_and_restore/application_backup_and_restore/backing_up_and_restoring/backing-up-applications.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ toc::[]
88

99
You back up applications by creating a xref:../../../backup_and_restore/application_backup_and_restore/backing_up_and_restoring/backing-up-applications.adoc#oadp-creating-backup-cr_backing-up-applications[`Backup`] custom resource (CR).
1010

11-
The `Backup` CR backs up Kubernetes resources and internal images by saving them as an archive file on S3 object storage.
11+
The `Backup` CR creates backup files for Kubernetes resources and internal images, on S3 object storage, and snapshots for persistent volumes (PVs), if the cloud provider uses a native snapshot API or the xref:../../../backup_and_restore/application_backup_and_restore/backing_up_and_restoring/backing-up-applications.adoc#oadp-backing-up-pvs-csi_backing-up-applications[Container Storage Interface (CSI)] to create snapshots, such as OpenShift Container Storage 4. For more information, see xref:../../../storage/container_storage_interface/persistent-storage-csi-snapshots.adoc#persistent-storage-csi-snapshots[CSI volume snapshots].
1212

1313
:FeatureName: The `CloudStorage` API for S3 storage
1414
include::modules/technology-preview.adoc[]
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
:_content-type: ASSEMBLY
2+
[id="troubleshooting"]
3+
= Troubleshooting
4+
include::modules/common-attributes.adoc[]
5+
:context: oadp-troubleshooting
6+
:oadp-troubleshooting:
7+
:namespace: openshift-adp
8+
:local-product: OADP
9+
:must-gather: registry.access.redhat.com/oadp-operator/oadp-must-gather-rhel8:v1.0
10+
11+
toc::[]
12+
13+
You can debug Velero custom resources (CRs) by using the xref:../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#oadp-debugging-oc-cli_oadp-troubleshooting[OpenShift CLI tool] or the xref:../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#migration-debugging-velero-resources_oadp-troubleshooting[Velero CLI tool]. The Velero CLI tool provides more detailed logs and information.
14+
15+
You can check xref:../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#oadp-installation-issues_oadp-troubleshooting[installation issues], xref:../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#oadp-backup-restore-cr-issues_oadp-troubleshooting[backup and restore CR issues], and xref:../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#oadp-restic-issues_oadp-troubleshooting[Restic issues].
16+
17+
You can collect logs, CR information, and Prometheus metric data by using the xref:../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#migration-using-must-gather_oadp-troubleshooting[`must-gather` tool].
18+
19+
include::modules/oadp-debugging-oc-cli.adoc[leveloffset=+1]
20+
include::modules/migration-debugging-velero-resources.adoc[leveloffset=+1]
21+
22+
include::modules/oadp-installation-issues.adoc[leveloffset=+1]
23+
include::modules/oadp-backup-restore-cr-issues.adoc[leveloffset=+1]
24+
include::modules/oadp-restic-issues.adoc[leveloffset=+1]
25+
26+
include::modules/migration-using-must-gather.adoc[leveloffset=+1]
27+
28+
:oadp-troubleshooting!:

migrating_from_ocp_3_to_4/troubleshooting-3-4.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@
44
include::modules/common-attributes.adoc[]
55
:context: troubleshooting-3-4
66
:troubleshooting-3-4:
7+
:namespace: openshift-migration
8+
:local-product: {mtc-short}
9+
:must-gather: registry.redhat.io/rhmtc/openshift-migration-must-gather-rhel8:v{mtc-version}
710

811
toc::[]
912

migration_toolkit_for_containers/troubleshooting-mtc.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@
44
include::modules/common-attributes.adoc[]
55
:context: troubleshooting-mtc
66
:troubleshooting-mtc:
7+
:namespace: openshift-migration
8+
:local-product: {mtc-short}
9+
:must-gather: registry.redhat.io/rhmtc/openshift-migration-must-gather-rhel8:v{mtc-version}
710

811
toc::[]
912

Lines changed: 53 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,59 +1,88 @@
11
// Module included in the following assemblies:
22
//
3+
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
34
// * migrating_from_ocp_3_to_4/troubleshooting-3-4.adoc
45
// * migration_toolkit_for_containers/troubleshooting-mtc
56

67
[id="migration-debugging-velero-resources_{context}"]
7-
= Using the Velero CLI to debug Backup and Restore CRs
8+
= Debugging Velero resources with the Velero CLI tool
89

9-
You can debug the `Backup` and `Restore` custom resources (CRs) and partial migration failures with the Velero command line interface (CLI). The Velero CLI runs in the `velero` pod.
10+
You can debug `Backup` and `Restore` custom resources (CRs) and retrieve logs with the Velero CLI tool.
1011

12+
The Velero CLI tool provides more detailed information than the OpenShift CLI tool.
13+
14+
[discrete]
1115
[id="velero-command-syntax_{context}"]
12-
== Velero command syntax
16+
== Syntax
17+
18+
Use the `oc exec` command to run a Velero CLI command:
1319

14-
Velero CLI commands use the following syntax:
15-
[source,terminal]
20+
[source,terminal,subs="attributes+"]
1621
----
17-
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero <resource> <command> <resource_id>
22+
$ oc exec $(oc get pods -n {namespace} -o name | grep velero) \
23+
-- ./velero <backup_restore_cr> <command> <cr_name>
1824
----
1925

20-
You can specify `velero-<pod> -n openshift-migration` in place of `$(oc get pods -n openshift-migration -o name | grep velero)`.
26+
.Example
27+
[source,terminal,subs="attributes+"]
28+
----
29+
$ oc exec $(oc get pods -n {namespace} -o name | grep velero) \
30+
-- ./velero backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8ql
31+
----
2132

22-
[id="help-command_{context}"]
23-
== Help command
33+
You can specify `velero-<pod> -n {namespace}` in place of `$(oc get pods -n {namespace} -o name | grep velero)`.
2434

25-
The Velero `help` command lists all the Velero CLI commands:
26-
[source,terminal]
35+
.Example
36+
[source,terminal,subs="attributes+"]
2737
----
28-
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero --help
38+
$ oc exec velero-<pod> -n {namespace} -- ./velero backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8ql
2939
----
3040

31-
[id="describe-command_{context}"]
41+
[discrete]
42+
[id="velero-help-option_{context}"]
43+
== Help option
44+
45+
Use the `velero --help` option to list all Velero CLI commands:
46+
47+
[source,terminal,subs="attributes+"]
48+
----
49+
$ oc exec $(oc get pods -n {namespace} -o name | grep velero) -- ./velero --help
50+
----
51+
52+
[discrete]
53+
[id="velero-describe-command_{context}"]
3254
== Describe command
3355

34-
The Velero `describe` command provides a summary of warnings and errors associated with a Velero resource:
35-
[source,terminal]
56+
Use the `velero describe` command to retrieve a summary of warnings and errors associated with a `Backup` or `Restore` CR:
57+
58+
[source,terminal,subs="attributes+"]
3659
----
37-
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero <resource> describe <resource_id>
60+
$ oc exec $(oc get pods -n {namespace} -o name | grep velero) \
61+
-- ./velero <backup_restore_cr> describe <cr_name>
3862
----
3963

4064
.Example
41-
[source,terminal]
65+
[source,terminal,subs="attributes+"]
4266
----
43-
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8ql
67+
$ oc exec $(oc get pods -n {namespace} -o name | grep velero) \
68+
-- ./velero backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8ql
4469
----
4570

46-
[id="logs-command_{context}"]
71+
[discrete]
72+
[id="velero-logs-command_{context}"]
4773
== Logs command
4874

49-
The Velero `logs` command provides the logs associated with a Velero resource:
50-
[source,terminal]
75+
Use the `velero logs` command to retrieve the logs of a `Backup` or `Restore` CR:
76+
77+
[source,terminal,subs="attributes+"]
5178
----
52-
velero <resource> logs <resource_id>
79+
$ oc exec $(oc get pods -n {namespace} -o name | grep velero) \
80+
-- ./velero <backup_restore_cr> logs <cr_name>
5381
----
5482

5583
.Example
56-
[source,terminal]
84+
[source,terminal,subs="attributes+"]
5785
----
58-
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
86+
$ oc exec $(oc get pods -n {namespace} -o name | grep velero) \
87+
-- ./velero restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
5988
----

modules/migration-using-must-gather.adoc

Lines changed: 67 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,28 @@
11
// Module included in the following assemblies:
22
//
33
// * migrating_from_ocp_3_to_4/troubleshooting-3-4.adoc
4-
// * migration_toolkit_for_containers/troubleshooting-mtc
4+
// * migration_toolkit_for_containers/troubleshooting-mtc.adoc
5+
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
56

7+
:_content-type: PROCEDURE
68
[id="migration-using-must-gather_{context}"]
7-
8-
9-
109
= Using the must-gather tool
1110

12-
You can collect logs, metrics, and information about {MTC-short} custom resources by using the `must-gather` tool.
11+
You can collect logs, metrics, and information about {local-product} custom resources by using the `must-gather` tool.
1312

1413
The `must-gather` data must be attached to all customer cases.
1514

15+
ifdef::troubleshooting-3-4,troubleshooting-mtc[]
1616
You can collect data for a one-hour or a 24-hour period and view the data with the Prometheus console.
17+
endif::[]
18+
ifdef::oadp-troubleshooting[]
19+
You can run the `must-gather` tool with the following data collection options:
20+
21+
* Full `must-gather` data collection collects Prometheus metrics, pod logs, and Velero CR information for all namespaces where the OADP Operator is installed.
22+
* Essential `must-gather` data collection collects pod logs and Velero CR information for a specific duration of time, for example, one hour or 24 hours. Prometheus metrics and duplicate logs are not included.
23+
* `must-gather` data collection with timeout. Data collection can take a long time if there are many failed `Backup` CRs. You can improve performance by setting a timeout value.
24+
* Prometheus metrics data dump downloads an archive file containing the metrics data collected by Prometheus.
25+
endif::[]
1726
1827
.Prerequisites
1928

@@ -23,29 +32,70 @@ You can collect data for a one-hour or a 24-hour period and view the data with t
2332
.Procedure
2433

2534
. Navigate to the directory where you want to store the `must-gather` data.
26-
. Run the `oc adm must-gather` command:
27-
28-
* To gather data for the past hour:
35+
. Run the `oc adm must-gather` command for one of the following data collection options:
36+
37+
ifdef::troubleshooting-3-4,troubleshooting-mtc[]
38+
* To collect data for the past hour:
39+
endif::[]
40+
ifdef::oadp-troubleshooting[]
41+
* Full `must-gather` data collection, including Prometheus metrics:
42+
endif::[]
2943
+
3044
[source,terminal,subs="attributes+"]
3145
----
32-
$ oc adm must-gather --image=registry.redhat.io/rhmtc/openshift-migration-must-gather-rhel8:v{mtc-version}
46+
$ oc adm must-gather --image={must-gather}
3347
----
3448
+
35-
The data is saved as `/must-gather/must-gather.tar.gz`. You can upload this file to a support case on the link:https://access.redhat.com/[Red Hat Customer Portal].
49+
The data is saved as `must-gather/must-gather.tar.gz`. You can upload this file to a support case on the link:https://access.redhat.com/[Red Hat Customer Portal].
3650

37-
* To gather data for the past 24 hours:
51+
ifdef::oadp-troubleshooting[]
52+
* Essential `must-gather` data collection, without Prometheus metrics, for a specific time duration:
53+
+
54+
[source,terminal,subs="attributes+"]
55+
----
56+
$ oc adm must-gather --image={must-gather} \
57+
-- /usr/bin/gather_<time>_essential <1>
58+
----
59+
<1> Specify the time in hours. Allowed values are `1h`, `6h`, `24h`, `72h`, or `all`, for example, `gather_1h_essential` or `gather_all_essential`.
60+
61+
* `must-gather` data collection with timeout:
62+
+
63+
[source,terminal,subs="attributes+"]
64+
----
65+
$ oc adm must-gather --image={must-gather} \
66+
-- /usr/bin/gather_with_timeout <timeout> <1>
67+
----
68+
<1> Specify a timeout value in seconds.
69+
endif::[]
70+
ifdef::troubleshooting-3-4,troubleshooting-mtc[]
71+
* To collect data for the past 24 hours:
72+
endif::[]
73+
ifdef::oadp-troubleshooting[]
74+
* Prometheus metrics data dump:
75+
endif::[]
3876
+
3977
[source,terminal,subs="attributes+"]
4078
----
41-
$ oc adm must-gather --image= \
42-
registry.redhat.io/rhmtc/openshift-migration-must-gather-rhel8: \
43-
v{mtc-version} -- /usr/bin/gather_metrics_dump
79+
$ oc adm must-gather --image={must-gather} \
80+
-- /usr/bin/gather_metrics_dump
4481
----
4582
+
46-
This operation can take a long time. The data is saved as `/must-gather/metrics/prom_data.tar.gz`. You can view this file with the Prometheus console.
83+
This operation can take a long time. The data is saved as `must-gather/metrics/prom_data.tar.gz`.
4784
48-
.To view data with the Prometheus console
85+
[discrete]
86+
[id="viewing-data-with-prometheus-console_{context}"]
87+
== Viewing metrics data with the Prometheus console
88+
89+
You can view the metrics data with the Prometheus console.
90+
91+
.Procedure
92+
93+
. Decompress the `prom_data.tar.gz` file:
94+
+
95+
[source,terminal]
96+
----
97+
$ tar -xvzf must-gather/metrics/prom_data.tar.gz
98+
----
4999

50100
. Create a local Prometheus instance:
51101
+
@@ -54,7 +104,7 @@ This operation can take a long time. The data is saved as `/must-gather/metrics/
54104
$ make prometheus-run
55105
----
56106
+
57-
The command outputs the Prometheus URL:
107+
The command outputs the Prometheus URL.
58108
+
59109
.Output
60110
[source,terminal]

modules/oadp-backing-up-applications-restic.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
//
33
// * backup_and_restore/application_backup_and_restore/backing_up_and_restoring/backing-up-applications.adoc
44

5+
:_content-type: PROCEDURE
56
[id="oadp-backing-up-applications-restic_{context}"]
67
= Backing up applications with Restic
78

0 commit comments

Comments
 (0)