Skip to content

Commit a0f90df

Browse files
authored
Merge pull request #9233 from naman-msft/docs-editor/node-not-ready-custom-script-e-1751316113
AB#6461: Update node-not-ready-custom-script-extension-errors.md
2 parents 3363891 + 9db71e1 commit a0f90df

File tree

1 file changed

+48
-16
lines changed

1 file changed

+48
-16
lines changed

support/azure/azure-kubernetes/availability-performance/node-not-ready-custom-script-extension-errors.md

Lines changed: 48 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,10 @@ description: Troubleshoot scenarios in which custom script extension (CSE) error
44
ms.date: 10/08/2022
55
ms.reviewer: rissing, chiragpa, momajed, v-leedennis
66
ms.service: azure-kubernetes-service
7-
ms.custom: sap:Node/node pool availability and performance, devx-track-azurecli
7+
ms.custom: sap:Node/node pool availability and performance, devx-track-azurecli, innovation-engine
88
#Customer intent: As an Azure Kubernetes user, I want to prevent custom script extension (CSE) errors so that I can avoid a Node Not Ready state within a node pool, and avoid a Cluster Not in Succeeded state within Azure Kubernetes Service (AKS).
99
---
10+
1011
# Troubleshoot node not ready failures caused by CSE errors
1112

1213
This article helps you troubleshoot scenarios in which a Microsoft Azure Kubernetes Service (AKS) cluster isn't in the `Succeeded` state and an AKS node isn't ready within a node pool because of custom script extension (CSE) errors.
@@ -25,12 +26,33 @@ The node extension deployment fails and returns more than one error code when yo
2526

2627
1. To better understand the current failure on the cluster, run the [az aks show](/cli/azure/aks#az-aks-show) and [az resource update](/cli/azure/resource#az-resource-update) commands to set up debugging:
2728

29+
Set your environment variables and run the commands to view the cluster's status and debug information.
30+
2831
```azurecli
32+
export RG_NAME="my-aks-rg"
33+
export CLUSTER_NAME="myakscluster"
2934
clusterResourceId=$(az aks show \
30-
--resource-group <resource-group-name> --name <cluster-name> --output tsv --query id)
35+
--resource-group $RG_NAME --name $CLUSTER_NAME --output tsv --query id)
3136
az resource update --debug --verbose --ids $clusterResourceId
3237
```
3338
39+
Results:
40+
41+
<!-- expected_similarity=0.3 -->
42+
43+
```output
44+
{
45+
"id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/my-aks-rg-xxx/providers/Microsoft.ContainerService/managedClusters/myaksclusterxxx",
46+
"name": "myaksclusterxxx",
47+
"type": "Microsoft.ContainerService/managedClusters",
48+
"location": "eastus2",
49+
"tags": null,
50+
"properties": {
51+
...
52+
}
53+
}
54+
```
55+
3456
1. Check the debugging output and the error messages that you received from the `az resource update` command against the error list in the [CSE helper](https://github.com/Azure/AgentBaker/blob/1bf9892afd715a34e0c6b7312e712047f10319ce/parts/linux/cloud-init/artifacts/cse_helpers.sh) executable file on GitHub.
3557
3658
If any of the errors involve the CSE deployment of the kubelet, then you've verified that the scenario that's described here's the cause of the Node Not Ready failure.
@@ -53,42 +75,52 @@ Set up your custom Domain Name System (DNS) server so that it can do name resolu
5375
5476
- For Virtual Machine Scale Set nodes, use the [az vmss run-command invoke](/cli/azure/vmss/run-command#az-vmss-run-command-invoke) command:
5577
78+
> **Important:** You must specify the `--instance-id` of the VM scale set. Here, we demonstrate querying for a valid instance ID (e.g., 0) and a likely VMSS in an AKS node resource group. Update values appropriately to match your environment.
79+
5680
```azurecli
81+
export NODE_RESOURCE_GROUP=$(az aks show --resource-group $RG_NAME --name $CLUSTER_NAME --query nodeResourceGroup -o tsv)
82+
export VMSS_NAME=$(az vmss list --resource-group $NODE_RESOURCE_GROUP --query "[0].name" -o tsv)
83+
export DNS_IP_ADDRESS="10.0.0.10"
84+
export INSTANCE_ID=$(az vmss list-instances --resource-group $NODE_RESOURCE_GROUP --name $VMSS_NAME --query "[0].instanceId" -o tsv)
85+
export API_FQDN=$(az aks show --resource-group $RG_NAME --name $CLUSTER_NAME --query fqdn -o tsv)
86+
5787
az vmss run-command invoke \
58-
--resource-group <resource-group-name> \
59-
--name <vm-scale-set-name> \
88+
--resource-group $NODE_RESOURCE_GROUP \
89+
--name $VMSS_NAME \
90+
--instance-id $INSTANCE_ID \
6091
--command-id RunShellScript \
61-
--instance-id 0 \
6292
--output tsv \
6393
--query "value[0].message" \
64-
--scripts "telnet <dns-ip-address> 53"
94+
--scripts "telnet $DNS_IP_ADDRESS 53"
6595
az vmss run-command invoke \
66-
--resource-group <resource-group-name> \
67-
--name <vm-scale-set-name> \
68-
--instance-id 0 \
96+
--resource-group $NODE_RESOURCE_GROUP \
97+
--name $VMSS_NAME \
98+
--instance-id $INSTANCE_ID \
6999
--command-id RunShellScript \
70100
--output tsv \
71101
--query "value[0].message" \
72-
--scripts "nslookup <api-fqdn> <dns-ip-address>"
102+
--scripts "nslookup $API_FQDN $DNS_IP_ADDRESS"
73103
```
74104
75105
- For VM availability set nodes, use the [az vm run-command invoke](/cli/azure/vm/run-command#az-vm-run-command-invoke) command:
76106
107+
> **Important:** You must specify the `--name` of a valid VM in an availability set in your resource group. Here is a template for running network checks.
108+
77109
```azurecli
78110
az vm run-command invoke \
79-
--resource-group <resource-group-name> \
80-
--name <vm-availability-set-name> \
111+
--resource-group $RG_NAME \
112+
--name $AVAILABILITY_SET_VM \
81113
--command-id RunShellScript \
82114
--output tsv \
83115
--query "value[0].message" \
84-
--scripts "telnet <dns-ip-address> 53"
116+
--scripts "telnet $DNS_IP_ADDRESS 53"
85117
az vm run-command invoke \
86-
--resource-group <resource-group-name> \
87-
--name <vm-availability-set-name> \
118+
--resource-group $RG_NAME \
119+
--name $AVAILABILITY_SET_VM \
88120
--command-id RunShellScript \
89121
--output tsv \
90122
--query "value[0].message" \
91-
--scripts "nslookup <api-fqdn> <dns-ip-address>"
123+
--scripts "nslookup $API_FQDN $DNS_IP_ADDRESS"
92124
```
93125
94126
For more information, see [Name resolution for resources in Azure virtual networks](/azure/virtual-network/virtual-networks-name-resolution-for-vms-and-role-instances) and [Hub and spoke with custom DNS](/azure/aks/private-clusters#hub-and-spoke-with-custom-dns).

0 commit comments

Comments
 (0)