add new step 11

tmalove · tmalove · commit 393f852a5c38 · 2021-11-12T15:40:49.000-05:00
diff --git a/modules/dr-restoring-cluster-state.adoc b/modules/dr-restoring-cluster-state.adoc
@@ -232,6 +232,192 @@ If the status is `Pending`, or the output lists more than one running etcd pod,
 
 .. Repeat this step for each lost control plane host that is not the recovery host.
 
+. Delete and recreate other non-recovery, control plane machines, one by one. After these machines are recreated, a new revision is forced and etcd scales up automatically.
++
+If you are running installer-provisioned infrastructure, or you used the Machine API to create your machines, follow these steps. Otherwise, you must create the new master node using the same method that was used to originally create it.
++
+[WARNING]
+====
+Do not delete and recreate the machine for the recovery host.
+====
+.. Obtain the machine for one of the lost control plane hosts.
++
+In a terminal that has access to the cluster as a cluster-admin user, run the following command:
++
+[source,terminal]
+----
+$ oc get machines -n openshift-machine-api -o wide
+----
++
+Example output:
++
+[source,terminal]
+----
+NAME                                        PHASE     TYPE        REGION      ZONE         AGE     NODE                           PROVIDERID                              STATE
+clustername-8qw5l-master-0                  Running   m4.xlarge   us-east-1   us-east-1a   3h37m   ip-10-0-131-183.ec2.internal   aws:///us-east-1a/i-0ec2782f8287dfb7e   stopped <1>
+clustername-8qw5l-master-1                  Running   m4.xlarge   us-east-1   us-east-1b   3h37m   ip-10-0-143-125.ec2.internal   aws:///us-east-1b/i-096c349b700a19631   running
+clustername-8qw5l-master-2                  Running   m4.xlarge   us-east-1   us-east-1c   3h37m   ip-10-0-154-194.ec2.internal    aws:///us-east-1c/i-02626f1dba9ed5bba  running
+clustername-8qw5l-worker-us-east-1a-wbtgd   Running   m4.large    us-east-1   us-east-1a   3h28m   ip-10-0-129-226.ec2.internal   aws:///us-east-1a/i-010ef6279b4662ced   running
+clustername-8qw5l-worker-us-east-1b-lrdxb   Running   m4.large    us-east-1   us-east-1b   3h28m   ip-10-0-144-248.ec2.internal   aws:///us-east-1b/i-0cb45ac45a166173b   running
+clustername-8qw5l-worker-us-east-1c-pkg26   Running   m4.large    us-east-1   us-east-1c   3h28m   ip-10-0-170-181.ec2.internal   aws:///us-east-1c/i-06861c00007751b0a   running
+----
+<1> This is the control plane machine for the lost control plane host, `ip-10-0-131-183.ec2.internal`.
+
+.. Save the machine configuration to a file on your file system:
++
+[source,terminal]
+----
+$ oc get machine clustername-8qw5l-master-0 \ <1>
+    -n openshift-machine-api \
+    -o yaml \
+    > new-master-machine.yaml
+----
+<1> Specify the name of the control plane machine for the lost control plane host.
+
+.. Edit the `new-master-machine.yaml` file that was created in the previous step to assign a new name and remove unnecessary fields.
+
+... Remove the entire `status` section:
++
+[source,terminal]
+----
+status:
+  addresses:
+  - address: 10.0.131.183
+    type: InternalIP
+  - address: ip-10-0-131-183.ec2.internal
+    type: InternalDNS
+  - address: ip-10-0-131-183.ec2.internal
+    type: Hostname
+  lastUpdated: "2020-04-20T17:44:29Z"
+  nodeRef:
+    kind: Node
+    name: ip-10-0-131-183.ec2.internal
+    uid: acca4411-af0d-4387-b73e-52b2484295ad
+  phase: Running
+  providerStatus:
+    apiVersion: awsproviderconfig.openshift.io/v1beta1
+    conditions:
+    - lastProbeTime: "2020-04-20T16:53:50Z"
+      lastTransitionTime: "2020-04-20T16:53:50Z"
+      message: machine successfully created
+      reason: MachineCreationSucceeded
+      status: "True"
+      type: MachineCreation
+    instanceId: i-0fdb85790d76d0c3f
+    instanceState: stopped
+    kind: AWSMachineProviderStatus
+----
+
+... Change the `metadata.name` field to a new name.
++
+It is recommended to keep the same base name as the old machine and change the ending number to the next available number. In this example, `clustername-8qw5l-master-0` is changed to `clustername-8qw5l-master-3`:
++
+[source,terminal]
+----
+apiVersion: machine.openshift.io/v1beta1
+kind: Machine
+metadata:
+  ...
+  name: clustername-8qw5l-master-3
+  ...
+----
+
+... Update the `metadata.selfLink` field to use the new machine name from the previous step:
++
+[source,terminal]
+----
+apiVersion: machine.openshift.io/v1beta1
+kind: Machine
+metadata:
+  ...
+  selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/clustername-8qw5l-master-3
+  ...
+----
+
+... Remove the `spec.providerID` field:
++
+[source,terminal]
+----
+providerID: aws:///us-east-1a/i-0fdb85790d76d0c3f
+----
+
+... Remove the `metadata.annotations` and `metadata.generation` fields:
++
+[source,terminal]
+----
+annotations:
+  machine.openshift.io/instance-state: running
+...
+generation: 2
+----
+
+... Remove the `metadata.resourceVersion` and `metadata.uid` fields:
++
+[source,terminal]
+----
+resourceVersion: "13291"
+uid: a282eb70-40a2-4e89-8009-d05dd420d31a
+----
+
+.. Delete the machine of the lost control plane host:
++
+[source,terminal]
+----
+$ oc delete machine -n openshift-machine-api clustername-8qw5l-master-0 <1>
+----
+<1> Specify the name of the control plane machine for the lost control plane host.
+
+.. Verify that the machine was deleted:
++
+[source,terminal]
+----
+$ oc get machines -n openshift-machine-api -o wide
+----
++
+Example output:
++
+[source,terminal]
+----
+NAME                                        PHASE     TYPE        REGION      ZONE         AGE     NODE                           PROVIDERID                              STATE
+clustername-8qw5l-master-1                  Running   m4.xlarge   us-east-1   us-east-1b   3h37m   ip-10-0-143-125.ec2.internal   aws:///us-east-1b/i-096c349b700a19631   running
+clustername-8qw5l-master-2                  Running   m4.xlarge   us-east-1   us-east-1c   3h37m   ip-10-0-154-194.ec2.internal   aws:///us-east-1c/i-02626f1dba9ed5bba  running
+clustername-8qw5l-worker-us-east-1a-wbtgd   Running   m4.large    us-east-1   us-east-1a   3h28m   ip-10-0-129-226.ec2.internal   aws:///us-east-1a/i-010ef6279b4662ced   running
+clustername-8qw5l-worker-us-east-1b-lrdxb   Running   m4.large    us-east-1   us-east-1b   3h28m   ip-10-0-144-248.ec2.internal   aws:///us-east-1b/i-0cb45ac45a166173b   running
+clustername-8qw5l-worker-us-east-1c-pkg26   Running   m4.large    us-east-1   us-east-1c   3h28m   ip-10-0-170-181.ec2.internal   aws:///us-east-1c/i-06861c00007751b0a   running
+----
+
+.. Create the new machine using the `new-master-machine.yaml` file:
++
+[source,terminal]
+----
+$ oc apply -f new-master-machine.yaml
+----
+
+.. Verify that the new machine has been created:
++
+[source,terminal]
+----
+$ oc get machines -n openshift-machine-api -o wide
+----
++
+Example output:
++
+[source,terminal]
+----
+NAME                                        PHASE          TYPE        REGION      ZONE         AGE     NODE                           PROVIDERID                              STATE
+clustername-8qw5l-master-1                  Running        m4.xlarge   us-east-1   us-east-1b   3h37m   ip-10-0-143-125.ec2.internal   aws:///us-east-1b/i-096c349b700a19631   running
+clustername-8qw5l-master-2                  Running        m4.xlarge   us-east-1   us-east-1c   3h37m   ip-10-0-154-194.ec2.internal    aws:///us-east-1c/i-02626f1dba9ed5bba  running
+clustername-8qw5l-master-3                  Provisioning   m4.xlarge   us-east-1   us-east-1a   85s     ip-10-0-173-171.ec2.internal    aws:///us-east-1a/i-015b0888fe17bc2c8  running <1>
+clustername-8qw5l-worker-us-east-1a-wbtgd   Running        m4.large    us-east-1   us-east-1a   3h28m   ip-10-0-129-226.ec2.internal   aws:///us-east-1a/i-010ef6279b4662ced   running
+clustername-8qw5l-worker-us-east-1b-lrdxb   Running        m4.large    us-east-1   us-east-1b   3h28m   ip-10-0-144-248.ec2.internal   aws:///us-east-1b/i-0cb45ac45a166173b   running
+clustername-8qw5l-worker-us-east-1c-pkg26   Running        m4.large    us-east-1   us-east-1c   3h28m   ip-10-0-170-181.ec2.internal   aws:///us-east-1c/i-06861c00007751b0a   running
+----
+<1> The new machine, `clustername-8qw5l-master-3` is being created and is ready after the phase changes from `Provisioning` to `Running`.
++
+It might take a few minutes for the new machine to be created. The etcd cluster Operator will automatically sync when the machine or node returns to a healthy state.
+
+.. Repeat these steps for each lost control plane host that is not the recovery host.
+
 . In a separate terminal window, log in to the cluster as a user with the `cluster-admin` role by using the following command:
 +
 [source,terminal]