Skip to content

Commit 5d8d947

Browse files
Feature/health check v ms (#19)
* Added templates/workerMachineHealthCheck.yaml * Added templates/controlPlaneMachineHealthCheck.yaml.
1 parent 9968f21 commit 5d8d947

File tree

2 files changed

+62
-0
lines changed

2 files changed

+62
-0
lines changed
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# This custom resource can be optionally be defined for a cluster, enabling CAPI Machine Health Checking
2+
# for its control plane nodes. See https://cluster-api.sigs.k8s.io/tasks/healthcheck.html
3+
#
4+
# This has intentionally been segregated from the cluster-templates, as it introduces a few complexities
5+
# surrounding CNIs:
6+
# - The CNI must be deployed and the cluster nodes become ready within the nodeStartTimeout (or else
7+
# the MachineHealthCheck remediation processes will begin terminating the unready worker nodes).
8+
# - Certain CNIs have been observed to hang the MachineHealthCheck remediation processes's attempts
9+
# to delete the failed node (inability to drain).
10+
# As such, the deployment of this component is left to the discretion of the cluster deployer.
11+
# If deployed independently of cluster-template, be sure to replace the placeholders in the below
12+
# before applying it to your management cluster.
13+
---
14+
apiVersion: cluster.x-k8s.io/v1alpha3
15+
kind: MachineHealthCheck
16+
metadata:
17+
name: ${CLUSTER_NAME}-kcp-unhealthy-2m
18+
spec:
19+
clusterName: ${CLUSTER_NAME}
20+
maxUnhealthy: 100%
21+
# nodeStartupTimeout: 10m
22+
selector:
23+
matchLabels:
24+
cluster.x-k8s.io/control-plane: ""
25+
unhealthyConditions:
26+
- type: Ready
27+
status: Unknown
28+
timeout: 120s
29+
- type: Ready
30+
status: "False"
31+
timeout: 120s
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# This custom resource can be optionally be defined for a cluster, enabling CAPI Machine Health Checking
2+
# for its worker nodes. See https://cluster-api.sigs.k8s.io/tasks/healthcheck.html
3+
#
4+
# This has intentionally been segregated from the cluster-templates, as it introduces a few complexities
5+
# surrounding CNIs:
6+
# - The CNI must be deployed and the cluster nodes become ready within the nodeStartTimeout (or else
7+
# the MachineHealthCheck remediation processes will begin terminating the unready worker nodes).
8+
# - Certain CNIs have been observed to hang the MachineHealthCheck remediation processes's attempts
9+
# to delete the failed node (inability to drain).
10+
# As such, the deployment of this component is left to the discretion of the cluster deployer.
11+
# If deployed independently of cluster-template, be sure to replace the placeholders in the below
12+
# before applying it to your management cluster.
13+
---
14+
apiVersion: cluster.x-k8s.io/v1alpha3
15+
kind: MachineHealthCheck
16+
metadata:
17+
name: ${CLUSTER_NAME}-workers-unhealthy-2m
18+
spec:
19+
clusterName: ${CLUSTER_NAME}
20+
maxUnhealthy: 100%
21+
nodeStartupTimeout: 10m
22+
selector:
23+
matchLabels:
24+
cluster.x-k8s.io/deployment-name: ${CLUSTER_NAME}-md-0
25+
unhealthyConditions:
26+
- type: Ready
27+
status: Unknown
28+
timeout: 120s
29+
- type: Ready
30+
status: "False"
31+
timeout: 120s

0 commit comments

Comments
 (0)