Skip to content

Commit 59b6c01

Browse files
authored
Merge pull request #8303 from VictoriaNoje/86541-node-not-ready-then-recovers
AB#4164: 86541 node not ready then recovers
2 parents 2e602ba + e6d7a9d commit 59b6c01

File tree

1 file changed

+10
-5
lines changed

1 file changed

+10
-5
lines changed

support/azure/azure-kubernetes/availability-performance/node-not-ready-then-recovers.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
11
---
22
title: Node not ready but then recovers
33
description: Troubleshoot scenarios in which the status of an AKS cluster node is Node Not Ready, but then the node recovers.
4-
ms.date: 12/09/2024
5-
ms.reviewer: rissing, chiragpa, momajed, v-leedennis
4+
ms.date: 2/25/2024
5+
ms.reviewer: rissing, chiragpa, momajed, v-leedennis, novictor
66
ms.service: azure-kubernetes-service
77
#Customer intent: As an Azure Kubernetes user, I want to prevent the Node Not Ready status for nodes that later recover so that I can avoid future errors within an AKS cluster.
88
ms.custom: sap:Node/node pool availability and performance
99
---
1010
# Troubleshoot Node Not Ready failures that are followed by recoveries
1111

12-
This article provides a guide to troubleshoot and resolve "Node Not Ready" issues in Azure Kubernetes Service (AKS) clusters. When a node enters a "Not Ready" state, it can disrupt the application's functionality and cause it to stop responding. Typically, the node recovers automatically after a short period. However, to prevent recurring issues and maintain a stable environment, it's important to understand the underlying causes to be able to implement effective resolutions.
12+
This article provides a guide to troubleshoot and resolve Node Not Ready" issues in Azure Kubernetes Service (AKS) clusters. When a node enters a "NotReady" state, it can disrupt the application's functionality and cause it to stop responding. Typically, the node recovers automatically after a short period. However, to prevent recurring issues and maintain a stable environment, it's important to understand the underlying causes to be able to implement effective resolutions.
1313

1414
## Cause
1515

16-
There are several scenarios that could cause a "Not Ready" state to occur:
16+
There are several scenarios that could cause a "NotReady" state to occur:
1717

1818
- The unavailability of the API server. This causes the readiness probe to fail. This prevents the pod from being attached to the service so that traffic is no longer forwarded to the pod instance.
1919

@@ -24,7 +24,12 @@ There are several scenarios that could cause a "Not Ready" state to occur:
2424

2525
## Resolution
2626

27-
Check the API server availability by running the `kubectl get apiservices` command. Make sure that the readiness probe is correctly configured in the deployment YAML file.
27+
To resolve this issue, follow these steps:
28+
29+
1. Run `kubectl describe node <node-name>` to review detail information about the node's status. Look for any error messages or warnings that might indicate the root cause of the issue.
30+
2. Check the API server availability by running the `kubectl get apiservices` command. Make sure that the readiness probe is correctly configured in the deployment YAML file.
31+
3. Verify the node's network configuration to make sure that there are no connectivity issues.
32+
4. Check the node's resource usage, such as CPU, memory, and disk, to identify potential constraints. For more informations see [Monitor your Kubernetes cluster performance with Container insights](/azure/azure-monitor/containers/container-insights-analyze#view-performance-directly-from-a-cluster)
2833

2934
For further steps, see [Basic troubleshooting of Node Not Ready failures](node-not-ready-basic-troubleshooting.md).
3035

0 commit comments

Comments
 (0)