Skip to content

Commit 64df079

Browse files
committed
Learn Editor: Update node-not-ready-custom-script-extension-errors.md
1 parent 325ced5 commit 64df079

File tree

1 file changed

+52
-19
lines changed

1 file changed

+52
-19
lines changed

support/azure/azure-kubernetes/availability-performance/node-not-ready-custom-script-extension-errors.md

Lines changed: 52 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
---
22
title: Node Not Ready because of custom script extension (CSE) errors
33
description: Troubleshoot scenarios in which custom script extension (CSE) errors cause Node Not Ready states in an Azure Kubernetes Service (AKS) cluster node pool.
4-
ms.date: 10/08/2022
4+
ms.date: 06/08/2024
55
ms.reviewer: rissing, chiragpa, momajed, v-leedennis
66
ms.service: azure-kubernetes-service
7-
ms.custom: sap:Node/node pool availability and performance, devx-track-azurecli
8-
#Customer intent: As an Azure Kubernetes user, I want to prevent custom script extension (CSE) errors so that I can avoid a Node Not Ready state within a node pool, and avoid a Cluster Not in Succeeded state within Azure Kubernetes Service (AKS).
7+
ms.custom: sap:Node/node pool availability and performance, devx-track-azurecli, innovation-engine
8+
author: MicrosoftDocs
9+
ms.author: MicrosoftDocs
910
---
11+
1012
# Troubleshoot node not ready failures caused by CSE errors
1113

1214
This article helps you troubleshoot scenarios in which a Microsoft Azure Kubernetes Service (AKS) cluster isn't in the `Succeeded` state and an AKS node isn't ready within a node pool because of custom script extension (CSE) errors.
@@ -25,12 +27,33 @@ The node extension deployment fails and returns more than one error code when yo
2527

2628
1. To better understand the current failure on the cluster, run the [az aks show](/cli/azure/aks#az-aks-show) and [az resource update](/cli/azure/resource#az-resource-update) commands to set up debugging:
2729

30+
Set your environment variables and run the commands to view the cluster's status and debug information.
31+
2832
```azurecli
33+
export RG_NAME="my-aks-rg"
34+
export CLUSTER_NAME="myakscluster"
2935
clusterResourceId=$(az aks show \
30-
--resource-group <resource-group-name> --name <cluster-name> --output tsv --query id)
36+
--resource-group $RG_NAME --name $CLUSTER_NAME --output tsv --query id)
3137
az resource update --debug --verbose --ids $clusterResourceId
3238
```
3339
40+
Results:
41+
42+
<!-- expected_similarity=0.3 -->
43+
44+
```output
45+
{
46+
"id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/my-aks-rg-xxx/providers/Microsoft.ContainerService/managedClusters/myaksclusterxxx",
47+
"name": "myaksclusterxxx",
48+
"type": "Microsoft.ContainerService/managedClusters",
49+
"location": "eastus2",
50+
"tags": null,
51+
"properties": {
52+
...
53+
}
54+
}
55+
```
56+
3457
1. Check the debugging output and the error messages that you received from the `az resource update` command against the error list in the [CSE helper](https://github.com/Azure/AgentBaker/blob/1bf9892afd715a34e0c6b7312e712047f10319ce/parts/linux/cloud-init/artifacts/cse_helpers.sh) executable file on GitHub.
3558
3659
If any of the errors involve the CSE deployment of the kubelet, then you've verified that the scenario that's described here's the cause of the Node Not Ready failure.
@@ -53,42 +76,52 @@ Set up your custom Domain Name System (DNS) server so that it can do name resolu
5376
5477
- For Virtual Machine Scale Set nodes, use the [az vmss run-command invoke](/cli/azure/vmss/run-command#az-vmss-run-command-invoke) command:
5578
79+
> **Important:** You must specify the `--instance-id` of the VM scale set. Here, we demonstrate querying for a valid instance ID (e.g., 0) and a likely VMSS in an AKS node resource group. Update values appropriately to match your environment.
80+
5681
```azurecli
82+
export NODE_RESOURCE_GROUP=$(az aks show --resource-group $RG_NAME --name $CLUSTER_NAME --query nodeResourceGroup -o tsv)
83+
export VMSS_NAME=$(az vmss list --resource-group $NODE_RESOURCE_GROUP --query "[0].name" -o tsv)
84+
export DNS_IP_ADDRESS="10.0.0.10"
85+
export INSTANCE_ID=$(az vmss list-instances --resource-group $NODE_RESOURCE_GROUP --name $VMSS_NAME --query "[0].instanceId" -o tsv)
86+
export API_FQDN=$(az aks show --resource-group $RG_NAME --name $CLUSTER_NAME --query fqdn -o tsv)
87+
5788
az vmss run-command invoke \
58-
--resource-group <resource-group-name> \
59-
--name <vm-scale-set-name> \
89+
--resource-group $NODE_RESOURCE_GROUP \
90+
--name $VMSS_NAME \
91+
--instance-id $INSTANCE_ID \
6092
--command-id RunShellScript \
61-
--instance-id 0 \
6293
--output tsv \
6394
--query "value[0].message" \
64-
--scripts "telnet <dns-ip-address> 53"
95+
--scripts "telnet $DNS_IP_ADDRESS 53"
6596
az vmss run-command invoke \
66-
--resource-group <resource-group-name> \
67-
--name <vm-scale-set-name> \
68-
--instance-id 0 \
97+
--resource-group $NODE_RESOURCE_GROUP \
98+
--name $VMSS_NAME \
99+
--instance-id $INSTANCE_ID \
69100
--command-id RunShellScript \
70101
--output tsv \
71102
--query "value[0].message" \
72-
--scripts "nslookup <api-fqdn> <dns-ip-address>"
103+
--scripts "nslookup $API_FQDN $DNS_IP_ADDRESS"
73104
```
74105
75106
- For VM availability set nodes, use the [az vm run-command invoke](/cli/azure/vm/run-command#az-vm-run-command-invoke) command:
76107
108+
> **Important:** You must specify the `--name` of a valid VM in an availability set in your resource group. Here is a template for running network checks.
109+
77110
```azurecli
78111
az vm run-command invoke \
79-
--resource-group <resource-group-name> \
80-
--name <vm-availability-set-name> \
112+
--resource-group $RG_NAME \
113+
--name $AVAILABILITY_SET_VM \
81114
--command-id RunShellScript \
82115
--output tsv \
83116
--query "value[0].message" \
84-
--scripts "telnet <dns-ip-address> 53"
117+
--scripts "telnet $DNS_IP_ADDRESS 53"
85118
az vm run-command invoke \
86-
--resource-group <resource-group-name> \
87-
--name <vm-availability-set-name> \
119+
--resource-group $RG_NAME \
120+
--name $AVAILABILITY_SET_VM \
88121
--command-id RunShellScript \
89122
--output tsv \
90123
--query "value[0].message" \
91-
--scripts "nslookup <api-fqdn> <dns-ip-address>"
124+
--scripts "nslookup $API_FQDN $DNS_IP_ADDRESS"
92125
```
93126
94127
For more information, see [Name resolution for resources in Azure virtual networks](/azure/virtual-network/virtual-networks-name-resolution-for-vms-and-role-instances) and [Hub and spoke with custom DNS](/azure/aks/private-clusters#hub-and-spoke-with-custom-dns).
@@ -114,4 +147,4 @@ Make sure that the API server can be reached and isn't subject to delays. To do
114147
115148
## More information
116149
117-
- For general troubleshooting steps, see [Basic troubleshooting of Node Not Ready failures](node-not-ready-basic-troubleshooting.md).
150+
- For general troubleshooting steps, see [Basic troubleshooting of Node Not Ready failures](node-not-ready-basic-troubleshooting.md).

0 commit comments

Comments
 (0)