Merge pull request #8303 from VictoriaNoje/86541-node-not-ready-then-recovers

genlin · web-flow · commit 59b6c0188232 · 2025-03-19T16:02:06.000+08:00
AB#4164: 86541 node not ready then recovers
diff --git a/support/azure/azure-kubernetes/availability-performance/node-not-ready-then-recovers.md b/support/azure/azure-kubernetes/availability-performance/node-not-ready-then-recovers.md
@@ -1,19 +1,19 @@
 ---
 title: Node not ready but then recovers
 description: Troubleshoot scenarios in which the status of an AKS cluster node is Node Not Ready, but then the node recovers.
-ms.date: 12/09/2024
-ms.reviewer: rissing, chiragpa, momajed, v-leedennis
+ms.date: 2/25/2024
+ms.reviewer: rissing, chiragpa, momajed, v-leedennis, novictor
 ms.service: azure-kubernetes-service
 #Customer intent: As an Azure Kubernetes user, I want to prevent the Node Not Ready status for nodes that later recover so that I can avoid future errors within an AKS cluster.
 ms.custom: sap:Node/node pool availability and performance
 ---
 # Troubleshoot Node Not Ready failures that are followed by recoveries
 
-This article provides a guide to troubleshoot and resolve "Node Not Ready" issues in Azure Kubernetes Service (AKS) clusters. When a node enters a "Not Ready" state, it can disrupt the application's functionality and cause it to stop responding. Typically, the node recovers automatically after a short period. However, to prevent recurring issues and maintain a stable environment, it's important to understand the underlying causes to be able to implement effective resolutions. 
+This article provides a guide to troubleshoot and resolve Node Not Ready" issues in Azure Kubernetes Service (AKS) clusters. When a node enters a "NotReady" state, it can disrupt the application's functionality and cause it to stop responding. Typically, the node recovers automatically after a short period. However, to prevent recurring issues and maintain a stable environment, it's important to understand the underlying causes to be able to implement effective resolutions.  
 
 ## Cause
 
-There are several scenarios that could cause a "Not Ready" state to occur:
+There are several scenarios that could cause a "NotReady" state to occur:
 
 - The unavailability of the API server. This causes the readiness probe to fail. This prevents the pod from being attached to the service so that traffic is no longer forwarded to the pod instance.
 
@@ -24,7 +24,12 @@ There are several scenarios that could cause a "Not Ready" state to occur:
 
 ## Resolution
 
-Check the API server availability by running the `kubectl get apiservices` command. Make sure that the readiness probe is correctly configured in the deployment YAML file.
+To resolve this issue, follow these steps:
+
+1. Run `kubectl describe node <node-name>` to review detail information about the node's status. Look for any error messages or warnings that might indicate the root cause of the issue.
+2. Check the API server availability by running the `kubectl get apiservices` command. Make sure that the readiness probe is correctly configured in the deployment YAML file.
+3. Verify the node's network configuration to make sure that there are no connectivity issues.
+4. Check the node's resource usage, such as CPU, memory, and disk, to identify potential constraints. For more informations see [Monitor your Kubernetes cluster performance with Container insights](/azure/azure-monitor/containers/container-insights-analyze#view-performance-directly-from-a-cluster)
 
 For further steps, see [Basic troubleshooting of Node Not Ready failures](node-not-ready-basic-troubleshooting.md).