You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/aks/node-auto-repair.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ ms.date: 03/10/2020
8
8
9
9
# Azure Kubernetes Service (AKS) node auto-repair
10
10
11
-
AKS continuously checks the health state of worker nodes and performs automatic repair of the nodes if they become unhealthy. This documentation describes how Azure Kubernetes Service (AKS) monitors worker nodes, and repairs unhealthy worker nodes. The documentation is to inform AKS operators on the behavior of node repair functionality.
11
+
AKS continuously checks the health state of worker nodes and performs automatic repair of the nodes if they become unhealthy. This documentation describes how Azure Kubernetes Service (AKS) monitors worker nodes, and repairs unhealthy worker nodes. The documentation is to inform AKS operators on the behavior of node repair functionality. It is also important to note that Azure platform [performs maintenance on Virtual Machines][vm-updates] that experience issues. AKS and Azure work together to minimize service disruptions for your clusters.
12
12
13
13
## How AKS checks for unhealthy nodes
14
14
@@ -28,11 +28,11 @@ kubectl get nodes
28
28
29
29
## How automatic repair works
30
30
31
-
Auto-repair takes several steps to repair a broken node. If a node is determined to be unhealthy, AKS attempts several remediation steps. The steps are performed in this order:
31
+
This behavior is for Virtual Machine Scale Sets. Auto-repair takes several steps to repair a broken node. If a node is determined to be unhealthy, AKS attempts several remediation steps. The steps are performed in this order:
32
32
33
-
1. After the container runtime becomes unresponsive for 10 minutes, the failing runtime daemons and related services are restarted on the node.
34
-
2. If the node does not become available within 10 minutes, the node is rebooted.
35
-
3. If the node is not available within 30 minutes, the node is re-imaged.
33
+
1. After the container runtime becomes unresponsive for 10 minutes, the failing runtime services are restarted on the node.
34
+
2. If the node is not ready within 10 minutes, the node is rebooted.
35
+
3. If the node is not ready within 30 minutes, the node is re-imaged.
36
36
37
37
> [!Note]
38
38
> If multiple nodes are unhealthy, they are repaired one by one
@@ -45,4 +45,4 @@ Use [Availability Zones][availability-zones] to increase high availability with
0 commit comments