Skip to content

Commit 9d33b25

Browse files
Merge pull request #250001 from aritraghosh/main
Update node-auto-repair.md
2 parents 5df075f + 612e820 commit 9d33b25

File tree

1 file changed

+6
-2
lines changed

1 file changed

+6
-2
lines changed

articles/aks/node-auto-repair.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,11 @@ Azure Kubernetes Service (AKS) continuously monitors the health state of worker
1111

1212
In this article, you learn how the automatic node repair functionality behaves for Windows and Linux nodes.
1313

14-
## How AKS checks for unhealthy nodes
14+
## How AKS checks for NotReady nodes
1515

1616
AKS uses the following rules to determine if a node is unhealthy and needs repair:
1717

18-
* The node reports the **NotReady** status on consecutive checks within a 10-minute time frame.
18+
* The node reports the [**NotReady**]((https://kubernetes.io/docs/reference/node/node-status/#condition) status on consecutive checks within a 10-minute time frame.
1919
* The node doesn't report any status within 10 minutes.
2020

2121
You can manually check the health state of your nodes with the `kubectl get nodes` command.
@@ -33,6 +33,10 @@ If AKS identifies an unhealthy node that remains unhealthy for *five* minutes, A
3333

3434
AKS engineers investigate alternative remediations if auto-repair is unsuccessful.
3535

36+
> [!NOTE]
37+
> Auto-repair is not triggered if the following taints are present on the node:` node.cloudprovider.kubernetes.io/shutdown`, `ToBeDeletedByClusterAutoscaler`
38+
> The overall auto repair process can take up to an hour to complete. AKS retries for a max of 3 times for each step.
39+
3640
## Node auto-drain
3741

3842
[Scheduled events][scheduled-events] can occur on the underlying VMs in any of your node pools. For [spot node pools][spot-node-pools], scheduled events may cause a *preempt* node event for the node. Certain node events, such as *preempt*, cause AKS node auto-drain to attempt a cordon and drain of the affected node. This process enables rescheduling for any affected workloads on that node. You might notice the node receives a taint with `"remediator.aks.microsoft.com/unschedulable"`, because of `"kubernetes.azure.com/scalesetpriority: spot"`.

0 commit comments

Comments
 (0)