You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ms.custom: sap:Node/node pool availability and performance, devx-track-azurecli
7
+
ms.custom: sap:Node/node pool availability and performance, devx-track-azurecli, innovation-engine
8
8
#Customer intent: As an Azure Kubernetes user, I want to prevent custom script extension (CSE) errors so that I can avoid a Node Not Ready state within a node pool, and avoid a Cluster Not in Succeeded state within Azure Kubernetes Service (AKS).
9
9
---
10
+
10
11
# Troubleshoot node not ready failures caused by CSE errors
11
12
12
13
This article helps you troubleshoot scenarios in which a Microsoft Azure Kubernetes Service (AKS) cluster isn't in the `Succeeded` state and an AKS node isn't ready within a node pool because of custom script extension (CSE) errors.
@@ -25,12 +26,33 @@ The node extension deployment fails and returns more than one error code when yo
25
26
26
27
1. To better understand the current failure on the cluster, run the [az aks show](/cli/azure/aks#az-aks-show) and [az resource update](/cli/azure/resource#az-resource-update) commands to set up debugging:
27
28
29
+
Set your environment variables and run the commands to view the cluster's status and debug information.
1. Check the debugging output and the error messages that you received from the `az resource update` command against the error list in the [CSE helper](https://github.com/Azure/AgentBaker/blob/1bf9892afd715a34e0c6b7312e712047f10319ce/parts/linux/cloud-init/artifacts/cse_helpers.sh) executable file on GitHub.
35
57
36
58
If any of the errors involve the CSE deployment of the kubelet, then you've verified that the scenario that's described here's the cause of the Node Not Ready failure.
@@ -53,42 +75,52 @@ Set up your custom Domain Name System (DNS) server so that it can do name resolu
53
75
54
76
- For Virtual Machine Scale Set nodes, use the [az vmss run-command invoke](/cli/azure/vmss/run-command#az-vmss-run-command-invoke) command:
55
77
78
+
> **Important:** You must specify the `--instance-id` of the VM scale set. Here, we demonstrate querying for a valid instance ID (e.g., 0) and a likely VMSS in an AKS node resource group. Update values appropriately to match your environment.
79
+
56
80
```azurecli
81
+
export NODE_RESOURCE_GROUP=$(az aks show --resource-group $RG_NAME --name $CLUSTER_NAME --query nodeResourceGroup -o tsv)
82
+
export VMSS_NAME=$(az vmss list --resource-group $NODE_RESOURCE_GROUP --query "[0].name" -o tsv)
export API_FQDN=$(az aks show --resource-group $RG_NAME --name $CLUSTER_NAME --query fqdn -o tsv)
86
+
57
87
az vmss run-command invoke \
58
-
--resource-group <resource-group-name> \
59
-
--name <vm-scale-set-name> \
88
+
--resource-group $NODE_RESOURCE_GROUP \
89
+
--name $VMSS_NAME \
90
+
--instance-id $INSTANCE_ID \
60
91
--command-id RunShellScript \
61
-
--instance-id 0 \
62
92
--output tsv \
63
93
--query "value[0].message" \
64
-
--scripts "telnet <dns-ip-address> 53"
94
+
--scripts "telnet $DNS_IP_ADDRESS 53"
65
95
az vmss run-command invoke \
66
-
--resource-group <resource-group-name> \
67
-
--name <vm-scale-set-name> \
68
-
--instance-id 0 \
96
+
--resource-group $NODE_RESOURCE_GROUP \
97
+
--name $VMSS_NAME \
98
+
--instance-id $INSTANCE_ID \
69
99
--command-id RunShellScript \
70
100
--output tsv \
71
101
--query "value[0].message" \
72
-
--scripts "nslookup <api-fqdn> <dns-ip-address>"
102
+
--scripts "nslookup $API_FQDN $DNS_IP_ADDRESS"
73
103
```
74
104
75
105
- For VM availability set nodes, use the [az vm run-command invoke](/cli/azure/vm/run-command#az-vm-run-command-invoke) command:
76
106
107
+
> **Important:** You must specify the `--name` of a valid VM in an availability set in your resource group. Here is a template for running network checks.
108
+
77
109
```azurecli
78
110
az vm run-command invoke \
79
-
--resource-group <resource-group-name> \
80
-
--name <vm-availability-set-name> \
111
+
--resource-group $RG_NAME \
112
+
--name $AVAILABILITY_SET_VM \
81
113
--command-id RunShellScript \
82
114
--output tsv \
83
115
--query "value[0].message" \
84
-
--scripts "telnet <dns-ip-address> 53"
116
+
--scripts "telnet $DNS_IP_ADDRESS 53"
85
117
az vm run-command invoke \
86
-
--resource-group <resource-group-name> \
87
-
--name <vm-availability-set-name> \
118
+
--resource-group $RG_NAME \
119
+
--name $AVAILABILITY_SET_VM \
88
120
--command-id RunShellScript \
89
121
--output tsv \
90
122
--query "value[0].message" \
91
-
--scripts "nslookup <api-fqdn> <dns-ip-address>"
123
+
--scripts "nslookup $API_FQDN $DNS_IP_ADDRESS"
92
124
```
93
125
94
126
For more information, see [Name resolution for resources in Azure virtual networks](/azure/virtual-network/virtual-networks-name-resolution-for-vms-and-role-instances) and [Hub and spoke with custom DNS](/azure/aks/private-clusters#hub-and-spoke-with-custom-dns).
0 commit comments