You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/aks/node-auto-repair.md
+18-1Lines changed: 18 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,6 +41,22 @@ Alternative remediations are investigated by AKS engineers if auto-repair is uns
41
41
If AKS finds multiple unhealthy nodes during a health check, each node is repaired individually before another repair begins.
42
42
43
43
44
+
## Node Autodrain
45
+
[Scheduled Events][scheduled-events] can occur on the underlying virtual machines (VMs) in any of your node pools. For [spot node pools][spot-node-pools], scheduled events may cause a *preempt* node event for the node. Certain node events, such as *preempt*, cause AKS node autodrain to attempt a cordon and drain of the affected node, which allows for a graceful reschedule of any affected workloads on that node.
46
+
47
+
48
+
The following table shows the node events, and the actions they cause for AKS node autodrain.
49
+
50
+
| Event | Description | Action |
51
+
| --- | --- | --- |
52
+
| Freeze | The VM is scheduled to pause for a few seconds. CPU and network connectivity may be suspended, but there is no impact on memory or open files | No action |
53
+
| Reboot | The VM is scheduled for reboot. The VM's non-persistent memory is lost. | No action |
54
+
| Redeploy | The VM is scheduled to move to another node. The VM's ephemeral disks are lost. | Cordon and drain |
55
+
| Preempt | The spot VM is being deleted. The VM's ephemeral disks are lost. | Cordon and drain |
56
+
| Terminate | The VM is scheduled to be deleted.| Cordon and drain |
57
+
58
+
59
+
44
60
## Limitations
45
61
46
62
In many cases, AKS can determine if a node is unhealthy and attempt to repair the issue, but there are cases where AKS either can't repair the issue or can't detect that there is an issue. For example, AKS can't detect issues if a node status is not being reported due to error in network configuration, or has failed to initially register as a healthy node.
@@ -50,7 +66,8 @@ In many cases, AKS can determine if a node is unhealthy and attempt to repair th
50
66
Use [Availability Zones][availability-zones] to increase high availability with your AKS cluster workloads.
0 commit comments