Skip to content

Commit 77cc6f9

Browse files
committed
Add TSG for cluster upgrade
1 parent c1a2d27 commit 77cc6f9

File tree

2 files changed

+114
-0
lines changed

2 files changed

+114
-0
lines changed

AKS-Arc/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -193,6 +193,8 @@
193193
href: entra-prompts.md
194194
- name: BGP with FRR not working
195195
href: connectivity-troubleshoot.md
196+
- name: Cluster status stuck during upgrade
197+
href: tsg-aksarc-upgrade-issues.md
196198
- name: Reference
197199
items:
198200
- name: Azure CLI

AKS-Arc/tsg-aksarc-upgrade-issues.md

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
---
2+
title: Troubleshoot the issue where the cluster is stuck in Upgrading state
3+
description: Learn how to troubleshoot and mitigate the issue when an AKS enabled by Arc cluster is stuck in 'Upgrading' state.
4+
ms.topic: troubleshooting
5+
author: rcheeran
6+
ms.author: rcheeran
7+
ms.date: 06/25/2025
8+
ms.reviewer: abha
9+
10+
---
11+
12+
# Troubleshoot the issue when the AKS Arc cluster is stuck in 'Upgrading' state
13+
14+
This article describes how to fix the issue when your AKS Arc cluster is stuck in 'Upgrading' state, when you try to upgrade the Kubernetes version on your cluster. This issue typically occurs after updating ASZ Local to version 2503 or 2504.
15+
16+
## Symptoms
17+
18+
When you try to upgrade an AKS Arc cluster, you notice that the **Current state** property of the cluster continues to show as 'Upgrading', as shown below:
19+
20+
```output
21+
az aksarc upgrade --name "cluster-name" --resource-group "rg-name"
22+
23+
===> Kubernetes may be unavailable during cluster upgrades.
24+
Are you sure you want to perform this operation? (y/N): y
25+
The cluster is on version 1.28.9 and is not in a failed state.
26+
27+
===> This will upgrade the control plane AND all nodepools to version 1.30.4. Continue? (y/N): y
28+
Upgrading the AKSArc cluster. This operation might take a while...
29+
{
30+
"extendedLocation": {
31+
"name": "/subscriptions/resourceGroups/Bellevue/providers/Microsoft.ExtendedLocation/customLocations/bel-CL",
32+
"type": "CustomLocation"
33+
},
34+
"id": "/subscriptions/fbaf508b-cb61-4383-9cda-a42bfa0c7bc9/resourceGroups/Bellevue/providers/Microsoft.Kubernetes/ConnectedClusters/Bel-cluster/providers/Microsoft.HybridContainerService/ProvisionedClusterInstances/default",
35+
"name": "default",
36+
"properties": {
37+
"kubernetesVersion": "1.30.4",
38+
"provisioningState": "Succeeded",
39+
"currentState": "Upgrading",
40+
"errorMessage": null,
41+
"operationStatus": null
42+
"agentPoolProfiles": [
43+
{
44+
...
45+
```
46+
47+
48+
## Possible causes and follow-ups
49+
50+
- The root cause is a recent change introduced in Azure Local version 2503. Under certain conditions, transient or intermittent failures during the Kubernetes upgrade process are not correctly detected or recovered from, leading the cluster state to remain indefinitely in the 'Upgrading' state.
51+
- You will hit this issue if the version of the AKS Arc extension on your custom location - the `hybridaksextension` extension's version is 2.1.211 or 2.1.223. You can run the following command to check the extension version on your cluster:
52+
53+
```azurecli
54+
az login --use-device-code --tenant <Azure tenant ID>
55+
az account set -s <subscription ID>
56+
$res=get-archcimgmt
57+
az k8s-extension show -g $res.HybridaksExtension.resourceGroup -c $res.ResourceBridge.name --cluster-type appliances --name hybridaksextension
58+
```
59+
60+
61+
## Mitigation
62+
This issue can be resolved by invoking the AKS Arc update call. This will retrigger the upgrade flow as well. You can invoke the `aksarc update` command with some placeholder parameters. So in this case, you could invoke the update call to enable NFS or SMB drivers if those features are not already enabled. First, check whether any of the features enabled
63+
64+
```azurecli
65+
az login --use-device-code --tenant <Azure tenant ID>
66+
az account set -s <subscription ID>
67+
az aksarc show -g <resource_group_name> -n <cluster_name>
68+
```
69+
Check the storage profile setion:
70+
```json
71+
"storageProfile": {
72+
"nfsCsiDriver": {
73+
"enabled": false
74+
},
75+
"smbCsiDriver": {
76+
"enabled": true
77+
}
78+
}
79+
```
80+
81+
If one of the drivers are disabled, you can enable it using the following command
82+
83+
```azurecli
84+
az aksarc update --enable-smb-driver -g <resource_group_name> -n <cluster_name>
85+
az aksarc update --enable-nfs-driver -g <resource_group_name> -n <cluster_name>
86+
```
87+
88+
Running the `aksarc update` command should resolve the issue and the `Current state` parameter of the cluster should now show as 'Succeeded'. Once the status is updated, if you don't want to retain the drivers as enabled, you can revert this action by running the following command
89+
90+
```azurecli
91+
az aksarc update --disable-smb-driver -g <resource_group_name> -n <cluster_name>
92+
az aksarc update --disable-nfs-driver -g <resource_group_name> -n <cluster_name>
93+
```
94+
If you find that both of the drivers are enabled on your cluster, you can disable the driver you are not using. If you are using both drivers, please contact the support team for further instructions.
95+
96+
## Verification
97+
You can check that the K8s version upgrade has completed, and state has moved to Succeeded, by running the following command and checking for the **Current State** parameter in the JSON.
98+
99+
```azurecli
100+
az aksarc show -g <resource_group> -n <cluster_name>
101+
102+
```
103+
104+
105+
## Contact Microsoft Support
106+
107+
If the problem persists, collect the following information before [creating a support request](aks-troubleshoot.md#open-a-support-request). Collect [AKS cluster logs](get-on-demand-logs.md) before creating the support request.
108+
109+
## Next steps
110+
111+
- [Use the diagnostic checker tool to identify common environment issues](aks-arc-diagnostic-checker.md)
112+
- [Review AKS on Azure Local architecture](cluster-architecture.md)

0 commit comments

Comments
 (0)