Skip to content

Commit 393f852

Browse files
committed
add new step 11
1 parent 5de1fa6 commit 393f852

File tree

1 file changed

+186
-0
lines changed

1 file changed

+186
-0
lines changed

modules/dr-restoring-cluster-state.adoc

Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -232,6 +232,192 @@ If the status is `Pending`, or the output lists more than one running etcd pod,
232232

233233
.. Repeat this step for each lost control plane host that is not the recovery host.
234234

235+
. Delete and recreate other non-recovery, control plane machines, one by one. After these machines are recreated, a new revision is forced and etcd scales up automatically.
236+
+
237+
If you are running installer-provisioned infrastructure, or you used the Machine API to create your machines, follow these steps. Otherwise, you must create the new master node using the same method that was used to originally create it.
238+
+
239+
[WARNING]
240+
====
241+
Do not delete and recreate the machine for the recovery host.
242+
====
243+
.. Obtain the machine for one of the lost control plane hosts.
244+
+
245+
In a terminal that has access to the cluster as a cluster-admin user, run the following command:
246+
+
247+
[source,terminal]
248+
----
249+
$ oc get machines -n openshift-machine-api -o wide
250+
----
251+
+
252+
Example output:
253+
+
254+
[source,terminal]
255+
----
256+
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
257+
clustername-8qw5l-master-0 Running m4.xlarge us-east-1 us-east-1a 3h37m ip-10-0-131-183.ec2.internal aws:///us-east-1a/i-0ec2782f8287dfb7e stopped <1>
258+
clustername-8qw5l-master-1 Running m4.xlarge us-east-1 us-east-1b 3h37m ip-10-0-143-125.ec2.internal aws:///us-east-1b/i-096c349b700a19631 running
259+
clustername-8qw5l-master-2 Running m4.xlarge us-east-1 us-east-1c 3h37m ip-10-0-154-194.ec2.internal aws:///us-east-1c/i-02626f1dba9ed5bba running
260+
clustername-8qw5l-worker-us-east-1a-wbtgd Running m4.large us-east-1 us-east-1a 3h28m ip-10-0-129-226.ec2.internal aws:///us-east-1a/i-010ef6279b4662ced running
261+
clustername-8qw5l-worker-us-east-1b-lrdxb Running m4.large us-east-1 us-east-1b 3h28m ip-10-0-144-248.ec2.internal aws:///us-east-1b/i-0cb45ac45a166173b running
262+
clustername-8qw5l-worker-us-east-1c-pkg26 Running m4.large us-east-1 us-east-1c 3h28m ip-10-0-170-181.ec2.internal aws:///us-east-1c/i-06861c00007751b0a running
263+
----
264+
<1> This is the control plane machine for the lost control plane host, `ip-10-0-131-183.ec2.internal`.
265+
266+
.. Save the machine configuration to a file on your file system:
267+
+
268+
[source,terminal]
269+
----
270+
$ oc get machine clustername-8qw5l-master-0 \ <1>
271+
-n openshift-machine-api \
272+
-o yaml \
273+
> new-master-machine.yaml
274+
----
275+
<1> Specify the name of the control plane machine for the lost control plane host.
276+
277+
.. Edit the `new-master-machine.yaml` file that was created in the previous step to assign a new name and remove unnecessary fields.
278+
279+
... Remove the entire `status` section:
280+
+
281+
[source,terminal]
282+
----
283+
status:
284+
addresses:
285+
- address: 10.0.131.183
286+
type: InternalIP
287+
- address: ip-10-0-131-183.ec2.internal
288+
type: InternalDNS
289+
- address: ip-10-0-131-183.ec2.internal
290+
type: Hostname
291+
lastUpdated: "2020-04-20T17:44:29Z"
292+
nodeRef:
293+
kind: Node
294+
name: ip-10-0-131-183.ec2.internal
295+
uid: acca4411-af0d-4387-b73e-52b2484295ad
296+
phase: Running
297+
providerStatus:
298+
apiVersion: awsproviderconfig.openshift.io/v1beta1
299+
conditions:
300+
- lastProbeTime: "2020-04-20T16:53:50Z"
301+
lastTransitionTime: "2020-04-20T16:53:50Z"
302+
message: machine successfully created
303+
reason: MachineCreationSucceeded
304+
status: "True"
305+
type: MachineCreation
306+
instanceId: i-0fdb85790d76d0c3f
307+
instanceState: stopped
308+
kind: AWSMachineProviderStatus
309+
----
310+
311+
... Change the `metadata.name` field to a new name.
312+
+
313+
It is recommended to keep the same base name as the old machine and change the ending number to the next available number. In this example, `clustername-8qw5l-master-0` is changed to `clustername-8qw5l-master-3`:
314+
+
315+
[source,terminal]
316+
----
317+
apiVersion: machine.openshift.io/v1beta1
318+
kind: Machine
319+
metadata:
320+
...
321+
name: clustername-8qw5l-master-3
322+
...
323+
----
324+
325+
... Update the `metadata.selfLink` field to use the new machine name from the previous step:
326+
+
327+
[source,terminal]
328+
----
329+
apiVersion: machine.openshift.io/v1beta1
330+
kind: Machine
331+
metadata:
332+
...
333+
selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/clustername-8qw5l-master-3
334+
...
335+
----
336+
337+
... Remove the `spec.providerID` field:
338+
+
339+
[source,terminal]
340+
----
341+
providerID: aws:///us-east-1a/i-0fdb85790d76d0c3f
342+
----
343+
344+
... Remove the `metadata.annotations` and `metadata.generation` fields:
345+
+
346+
[source,terminal]
347+
----
348+
annotations:
349+
machine.openshift.io/instance-state: running
350+
...
351+
generation: 2
352+
----
353+
354+
... Remove the `metadata.resourceVersion` and `metadata.uid` fields:
355+
+
356+
[source,terminal]
357+
----
358+
resourceVersion: "13291"
359+
uid: a282eb70-40a2-4e89-8009-d05dd420d31a
360+
----
361+
362+
.. Delete the machine of the lost control plane host:
363+
+
364+
[source,terminal]
365+
----
366+
$ oc delete machine -n openshift-machine-api clustername-8qw5l-master-0 <1>
367+
----
368+
<1> Specify the name of the control plane machine for the lost control plane host.
369+
370+
.. Verify that the machine was deleted:
371+
+
372+
[source,terminal]
373+
----
374+
$ oc get machines -n openshift-machine-api -o wide
375+
----
376+
+
377+
Example output:
378+
+
379+
[source,terminal]
380+
----
381+
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
382+
clustername-8qw5l-master-1 Running m4.xlarge us-east-1 us-east-1b 3h37m ip-10-0-143-125.ec2.internal aws:///us-east-1b/i-096c349b700a19631 running
383+
clustername-8qw5l-master-2 Running m4.xlarge us-east-1 us-east-1c 3h37m ip-10-0-154-194.ec2.internal aws:///us-east-1c/i-02626f1dba9ed5bba running
384+
clustername-8qw5l-worker-us-east-1a-wbtgd Running m4.large us-east-1 us-east-1a 3h28m ip-10-0-129-226.ec2.internal aws:///us-east-1a/i-010ef6279b4662ced running
385+
clustername-8qw5l-worker-us-east-1b-lrdxb Running m4.large us-east-1 us-east-1b 3h28m ip-10-0-144-248.ec2.internal aws:///us-east-1b/i-0cb45ac45a166173b running
386+
clustername-8qw5l-worker-us-east-1c-pkg26 Running m4.large us-east-1 us-east-1c 3h28m ip-10-0-170-181.ec2.internal aws:///us-east-1c/i-06861c00007751b0a running
387+
----
388+
389+
.. Create the new machine using the `new-master-machine.yaml` file:
390+
+
391+
[source,terminal]
392+
----
393+
$ oc apply -f new-master-machine.yaml
394+
----
395+
396+
.. Verify that the new machine has been created:
397+
+
398+
[source,terminal]
399+
----
400+
$ oc get machines -n openshift-machine-api -o wide
401+
----
402+
+
403+
Example output:
404+
+
405+
[source,terminal]
406+
----
407+
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
408+
clustername-8qw5l-master-1 Running m4.xlarge us-east-1 us-east-1b 3h37m ip-10-0-143-125.ec2.internal aws:///us-east-1b/i-096c349b700a19631 running
409+
clustername-8qw5l-master-2 Running m4.xlarge us-east-1 us-east-1c 3h37m ip-10-0-154-194.ec2.internal aws:///us-east-1c/i-02626f1dba9ed5bba running
410+
clustername-8qw5l-master-3 Provisioning m4.xlarge us-east-1 us-east-1a 85s ip-10-0-173-171.ec2.internal aws:///us-east-1a/i-015b0888fe17bc2c8 running <1>
411+
clustername-8qw5l-worker-us-east-1a-wbtgd Running m4.large us-east-1 us-east-1a 3h28m ip-10-0-129-226.ec2.internal aws:///us-east-1a/i-010ef6279b4662ced running
412+
clustername-8qw5l-worker-us-east-1b-lrdxb Running m4.large us-east-1 us-east-1b 3h28m ip-10-0-144-248.ec2.internal aws:///us-east-1b/i-0cb45ac45a166173b running
413+
clustername-8qw5l-worker-us-east-1c-pkg26 Running m4.large us-east-1 us-east-1c 3h28m ip-10-0-170-181.ec2.internal aws:///us-east-1c/i-06861c00007751b0a running
414+
----
415+
<1> The new machine, `clustername-8qw5l-master-3` is being created and is ready after the phase changes from `Provisioning` to `Running`.
416+
+
417+
It might take a few minutes for the new machine to be created. The etcd cluster Operator will automatically sync when the machine or node returns to a healthy state.
418+
419+
.. Repeat these steps for each lost control plane host that is not the recovery host.
420+
235421
. In a separate terminal window, log in to the cluster as a user with the `cluster-admin` role by using the following command:
236422
+
237423
[source,terminal]

0 commit comments

Comments
 (0)