You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TELCODOCS-329: Understanding the metal3-based remediation process added
TELCODOCS-329: metal3-remediation-template added
TELCODOCS-329: Dev feedback applied
TELCODOCS-329: More Dev feedback applied
TELCODOCS-329: QE feedback applied
TELCODOCS: QE & Dev feedback applied
TELCODOCS: QE & Dev feedback applied
TELCODOCS: QE & Dev feedback applied
TELCODOCS-329: Peer review feedback applied
TELCODOCS-329: Final check and squashing commits
== Understanding the annotation-based remediation process
35
37
36
38
The remediation process operates as follows:
37
39
@@ -47,13 +49,30 @@ The remediation process operates as follows:
47
49
If the power operations did not complete, the bare metal machine controller triggers the reprovisioning of the unhealthy node unless this is a control plane node or a node that was provisioned externally.
== Understanding the metal3-based remediation process
54
+
55
+
The remediation process operates as follows:
56
+
57
+
. The MachineHealthCheck (MHC) controller detects that a node is unhealthy.
58
+
. The MHC creates a metal3 remediation custom resource for the metal3 remediation controller, which requests to power-off the unhealthy node.
59
+
. After the power is off, the node is deleted, which allows the cluster to reschedule the affected workload on other nodes.
60
+
. The metal3 remediation controller requests to power on the node.
61
+
. After the node is up, the node re-registers itself with the cluster, resulting in the creation of a new node.
62
+
. After the node is recreated, the metal3 remediation controller restores the annotations and labels that existed on the unhealthy node before its deletion.
63
+
64
+
[NOTE]
65
+
====
66
+
If the power operations did not complete, the metal3 remediation controller triggers the reprovisioning of the unhealthy node unless this is a control plane node or a node that was provisioned externally.
67
+
====
68
+
50
69
[id="mgmt-creating-mhc-baremetal_{context}"]
51
70
== Creating a MachineHealthCheck resource for bare metal
52
71
53
72
.Prerequisites
54
73
55
74
* The {product-title} is installed using installer-provisioned infrastructure (IPI).
56
-
* Access to Baseboard Management Controller (BMC) credentials (or BMC access to each node)
75
+
* Access to BMC credentials (or BMC access to each node).
57
76
* Network access to the BMC interface of the unhealthy node.
58
77
59
78
.Procedure
@@ -65,7 +84,7 @@ If the power operations did not complete, the bare metal machine controller trig
65
84
$ oc apply -f healthcheck.yaml
66
85
----
67
86
68
-
.Sample `MachineHealthCheck` resource for bare metal
87
+
.Sample `MachineHealthCheck` resource for bare metal, annotation-based remediation
69
88
[source,yaml]
70
89
----
71
90
apiVersion: machine.openshift.io/v1beta1
@@ -105,6 +124,52 @@ spec:
105
124
The `matchLabels` are examples only; you must map your machine groups based on your specific needs.
106
125
====
107
126
127
+
.Sample `MachineHealthCheck` resource for bare metal, metal3-based remediation
The `matchLabels` are examples only; you must map your machine groups based on your specific needs. The `annotations` section does not apply to metal3-based remediation. Annotation-based remediation and metal3-based remediation are mutually exclusive.
0 commit comments