You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: support/azure/azure-kubernetes/availability-performance/node-not-ready-custom-script-extension-errors.md
+52-19Lines changed: 52 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,14 @@
1
1
---
2
2
title: Node Not Ready because of custom script extension (CSE) errors
3
3
description: Troubleshoot scenarios in which custom script extension (CSE) errors cause Node Not Ready states in an Azure Kubernetes Service (AKS) cluster node pool.
ms.custom: sap:Node/node pool availability and performance, devx-track-azurecli
8
-
#Customer intent: As an Azure Kubernetes user, I want to prevent custom script extension (CSE) errors so that I can avoid a Node Not Ready state within a node pool, and avoid a Cluster Not in Succeeded state within Azure Kubernetes Service (AKS).
7
+
ms.custom: sap:Node/node pool availability and performance, devx-track-azurecli, innovation-engine
8
+
author: MicrosoftDocs
9
+
ms.author: MicrosoftDocs
9
10
---
11
+
10
12
# Troubleshoot node not ready failures caused by CSE errors
11
13
12
14
This article helps you troubleshoot scenarios in which a Microsoft Azure Kubernetes Service (AKS) cluster isn't in the `Succeeded` state and an AKS node isn't ready within a node pool because of custom script extension (CSE) errors.
@@ -25,12 +27,33 @@ The node extension deployment fails and returns more than one error code when yo
25
27
26
28
1. To better understand the current failure on the cluster, run the [az aks show](/cli/azure/aks#az-aks-show) and [az resource update](/cli/azure/resource#az-resource-update) commands to set up debugging:
27
29
30
+
Set your environment variables and run the commands to view the cluster's status and debug information.
1. Check the debugging output and the error messages that you received from the `az resource update` command against the error list in the [CSE helper](https://github.com/Azure/AgentBaker/blob/1bf9892afd715a34e0c6b7312e712047f10319ce/parts/linux/cloud-init/artifacts/cse_helpers.sh) executable file on GitHub.
35
58
36
59
If any of the errors involve the CSE deployment of the kubelet, then you've verified that the scenario that's described here's the cause of the Node Not Ready failure.
@@ -53,42 +76,52 @@ Set up your custom Domain Name System (DNS) server so that it can do name resolu
53
76
54
77
- For Virtual Machine Scale Set nodes, use the [az vmss run-command invoke](/cli/azure/vmss/run-command#az-vmss-run-command-invoke) command:
55
78
79
+
> **Important:** You must specify the `--instance-id` of the VM scale set. Here, we demonstrate querying for a valid instance ID (e.g., 0) and a likely VMSS in an AKS node resource group. Update values appropriately to match your environment.
80
+
56
81
```azurecli
82
+
export NODE_RESOURCE_GROUP=$(az aks show --resource-group $RG_NAME --name $CLUSTER_NAME --query nodeResourceGroup -o tsv)
83
+
export VMSS_NAME=$(az vmss list --resource-group $NODE_RESOURCE_GROUP --query "[0].name" -o tsv)
export API_FQDN=$(az aks show --resource-group $RG_NAME --name $CLUSTER_NAME --query fqdn -o tsv)
87
+
57
88
az vmss run-command invoke \
58
-
--resource-group <resource-group-name> \
59
-
--name <vm-scale-set-name> \
89
+
--resource-group $NODE_RESOURCE_GROUP \
90
+
--name $VMSS_NAME \
91
+
--instance-id $INSTANCE_ID \
60
92
--command-id RunShellScript \
61
-
--instance-id 0 \
62
93
--output tsv \
63
94
--query "value[0].message" \
64
-
--scripts "telnet <dns-ip-address> 53"
95
+
--scripts "telnet $DNS_IP_ADDRESS 53"
65
96
az vmss run-command invoke \
66
-
--resource-group <resource-group-name> \
67
-
--name <vm-scale-set-name> \
68
-
--instance-id 0 \
97
+
--resource-group $NODE_RESOURCE_GROUP \
98
+
--name $VMSS_NAME \
99
+
--instance-id $INSTANCE_ID \
69
100
--command-id RunShellScript \
70
101
--output tsv \
71
102
--query "value[0].message" \
72
-
--scripts "nslookup <api-fqdn> <dns-ip-address>"
103
+
--scripts "nslookup $API_FQDN $DNS_IP_ADDRESS"
73
104
```
74
105
75
106
- For VM availability set nodes, use the [az vm run-command invoke](/cli/azure/vm/run-command#az-vm-run-command-invoke) command:
76
107
108
+
> **Important:** You must specify the `--name` of a valid VM in an availability set in your resource group. Here is a template for running network checks.
109
+
77
110
```azurecli
78
111
az vm run-command invoke \
79
-
--resource-group <resource-group-name> \
80
-
--name <vm-availability-set-name> \
112
+
--resource-group $RG_NAME \
113
+
--name $AVAILABILITY_SET_VM \
81
114
--command-id RunShellScript \
82
115
--output tsv \
83
116
--query "value[0].message" \
84
-
--scripts "telnet <dns-ip-address> 53"
117
+
--scripts "telnet $DNS_IP_ADDRESS 53"
85
118
az vm run-command invoke \
86
-
--resource-group <resource-group-name> \
87
-
--name <vm-availability-set-name> \
119
+
--resource-group $RG_NAME \
120
+
--name $AVAILABILITY_SET_VM \
88
121
--command-id RunShellScript \
89
122
--output tsv \
90
123
--query "value[0].message" \
91
-
--scripts "nslookup <api-fqdn> <dns-ip-address>"
124
+
--scripts "nslookup $API_FQDN $DNS_IP_ADDRESS"
92
125
```
93
126
94
127
For more information, see [Name resolution for resources in Azure virtual networks](/azure/virtual-network/virtual-networks-name-resolution-for-vms-and-role-instances) and [Hub and spoke with custom DNS](/azure/aks/private-clusters#hub-and-spoke-with-custom-dns).
@@ -114,4 +147,4 @@ Make sure that the API server can be reached and isn't subject to delays. To do
114
147
115
148
## More information
116
149
117
-
- For general troubleshooting steps, see [Basic troubleshooting of Node Not Ready failures](node-not-ready-basic-troubleshooting.md).
150
+
- For general troubleshooting steps, see [Basic troubleshooting of Node Not Ready failures](node-not-ready-basic-troubleshooting.md).
0 commit comments