Skip to content

Commit b73b9cc

Browse files
committed
[OSDOCS-6831]: Adding etcd recovery docs for hosted control planes
1 parent 1faca0b commit b73b9cc

File tree

3 files changed

+116
-2
lines changed

3 files changed

+116
-2
lines changed

hosted_control_planes/hcp-backup-restore-dr.adoc

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,20 @@ If you need to back up and restore etcd on a hosted cluster or provide disaster
1111
:FeatureName: Hosted control planes
1212
include::snippets/technology-preview.adoc[]
1313

14+
[id="hosted-etcd-non-disruptive-recovery"]
15+
== Recovering etcd pods for hosted clusters
16+
17+
In hosted clusters, etcd pods run as part of a stateful set. The stateful set relies on persistent storage to store etcd data for each member. In a highly available control plane, the size of the stateful set is three pods, and each member has its own persistent volume claim.
18+
19+
include::modules/hosted-cluster-etcd-status.adoc[leveloffset=+2]
20+
include::modules/hosted-cluster-single-node-recovery.adoc[leveloffset=+2]
21+
1422
[id="hcp-backup-restore"]
15-
== Backing up and restoring etcd on a hosted cluster
23+
== Backing up and restoring etcd on a hosted cluster in AWS
24+
25+
If you use hosted control planes for {product-title}, the process to back up and restore etcd is different from xref:../backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc#backing-up-etcd-data_backup-etcd[the usual etcd backup process].
1626

17-
If you use hosted control planes on {product-title}, the process to back up and restore etcd is different from xref:../backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc#backing-up-etcd-data_backup-etcd[the usual etcd backup process].
27+
The following procedures are specific to hosted control planes on AWS.
1828

1929
// Backing up etcd on a hosted cluster
2030
include::modules/backup-etcd-hosted-cluster.adoc[leveloffset=+2]
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
// Module included in the following assembly:
2+
//
3+
// * hcp-backup-restore-dr.adoc
4+
5+
:_content-type: PROCEDURE
6+
[id="hosted-cluster-etcd-status_{context}"]
7+
= Checking the status of a hosted cluster
8+
9+
To check the status of your hosted cluster, complete the following steps.
10+
11+
.Procedure
12+
13+
. Enter the running etcd pod that you want to check by entering the following command:
14+
+
15+
[source,terminal]
16+
----
17+
$ oc rsh etcd-0
18+
----
19+
20+
. Set up the etcdctl environment by entering the following commands:
21+
+
22+
[source,terminal]
23+
----
24+
sh-4.4$ export ETCDCTL_API=3
25+
----
26+
+
27+
[source,terminal]
28+
----
29+
sh-4.4$ export ETCDCTL_CACERT=/etc/etcd/tls/etcd-ca/ca.crt
30+
----
31+
+
32+
[source,terminal]
33+
----
34+
sh-4.4$ export ETCDCTL_CERT=/etc/etcd/tls/client/etcd-client.crt
35+
----
36+
+
37+
[source,terminal]
38+
----
39+
sh-4.4$ export ETCDCTL_KEY=/etc/etcd/tls/client/etcd-client.key
40+
----
41+
+
42+
[source,terminal]
43+
----
44+
sh-4.4$ export ETCDCTL_ENDPOINTS=https://etcd-client:2379
45+
----
46+
47+
. Print the endpoint status for each cluster member by entering the following command:
48+
+
49+
[source,terminal]
50+
----
51+
sh-4.4$ etcdctl endpoint health --cluster -w table
52+
----
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
// Module included in the following assembly:
2+
//
3+
// * hcp-backup-restore-dr.adoc
4+
5+
:_content-type: PROCEDURE
6+
[id="hosted-cluster-single-node-recovery_{context}"]
7+
= Recovering an etcd member for a hosted cluster
8+
9+
An etcd member of a 3-node cluster might fail because of corrupted or missing data. To recover the etcd member, complete the following steps.
10+
11+
.Procedure
12+
13+
. If you need to confirm that the etcd member is failing, enter the following command:
14+
+
15+
[source,terminal]
16+
----
17+
$ oc get pods -l app=etcd -n <control_plane_namespace>
18+
----
19+
+
20+
The output resembles this example if the etcd member is failing:
21+
+
22+
.Example output
23+
[source,terminal]
24+
----
25+
NAME READY STATUS RESTARTS AGE
26+
etcd-0 2/2 Running 0 64m
27+
etcd-1 2/2 Running 0 45m
28+
etcd-2 1/2 CrashLoopBackOff 1 (5s ago) 64m
29+
----
30+
31+
. Delete the persistent volume claim of the failing etcd member and the pod by entering the following command:
32+
+
33+
[source,terminal]
34+
----
35+
$ oc delete pvc/data-etcd-2 pod/etcd-2 --wait=false
36+
----
37+
38+
. When the pod restarts, verify that the etcd member is added back to the etcd cluster and is correctly functioning by entering the following command:
39+
+
40+
[source,terminal]
41+
----
42+
$ oc get pods -l app=etcd -n $CONTROL_PLANE_NAMESPACE
43+
----
44+
+
45+
.Example output
46+
[source,terminal]
47+
----
48+
NAME READY STATUS RESTARTS AGE
49+
etcd-0 2/2 Running 0 67m
50+
etcd-1 2/2 Running 0 48m
51+
etcd-2 2/2 Running 0 2m2s
52+
----

0 commit comments

Comments
 (0)