Skip to content

Commit 7893ac5

Browse files
committed
Merge branch '2025-06-25' of https://github.com/rcheeran/azure-stack-docs-pr into rctsg626
2 parents 952f5ac + e7982bf commit 7893ac5

File tree

2 files changed

+117
-0
lines changed

2 files changed

+117
-0
lines changed

AKS-Arc/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -193,6 +193,8 @@
193193
href: entra-prompts.md
194194
- name: BGP with FRR not working
195195
href: connectivity-troubleshoot.md
196+
- name: Cluster status stuck during upgrade
197+
href: tsg-aksarc-upgrade-issues.md
196198
- name: Reference
197199
items:
198200
- name: Azure CLI

AKS-Arc/tsg-aksarc-upgrade-issues.md

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
---
2+
title: Troubleshoot the issue where the cluster is stuck in Upgrading state
3+
description: Learn how to troubleshoot and mitigate the issue when an AKS enabled by Arc cluster is stuck in 'Upgrading' state.
4+
ms.topic: troubleshooting
5+
author: rcheeran
6+
ms.author: rcheeran
7+
ms.date: 06/25/2025
8+
ms.reviewer: abha
9+
10+
---
11+
12+
# Troubleshoot the issue when the AKS Arc cluster is stuck in 'Upgrading' state
13+
14+
This article describes how to fix the issue when your Azure Kubernetes Service enabled by Arc (AKS Arc) cluster is stuck in 'Upgrading' state. This issue typically occurs after updating Azure Local to version 2503 or 2504 and when you try to upgrade the Kubernetes version on your cluster.
15+
16+
## Symptoms
17+
18+
When you try to upgrade an AKS Arc cluster, you notice that the **Current state** property of the cluster remains in the 'Upgrading' state.
19+
20+
```output
21+
az aksarc upgrade --name "cluster-name" --resource-group "rg-name"
22+
23+
===> Kubernetes may be unavailable during cluster upgrades.
24+
Are you sure you want to perform this operation? (y/N): y
25+
The cluster is on version 1.28.9 and is not in a failed state.
26+
27+
===> This will upgrade the control plane AND all nodepools to version 1.30.4. Continue? (y/N): y
28+
Upgrading the AKSArc cluster. This operation might take a while...
29+
{
30+
"extendedLocation": {
31+
"name": "/subscriptions/resourceGroups/Bellevue/providers/Microsoft.ExtendedLocation/customLocations/bel-CL",
32+
"type": "CustomLocation"
33+
},
34+
"id": "/subscriptions/fbaf508b-cb61-4383-9cda-a42bfa0c7bc9/resourceGroups/Bellevue/providers/Microsoft.Kubernetes/ConnectedClusters/Bel-cluster/providers/Microsoft.HybridContainerService/ProvisionedClusterInstances/default",
35+
"name": "default",
36+
"properties": {
37+
"kubernetesVersion": "1.30.4",
38+
"provisioningState": "Succeeded",
39+
"currentState": "Upgrading",
40+
"errorMessage": null,
41+
"operationStatus": null
42+
"agentPoolProfiles": [
43+
{
44+
...
45+
```
46+
47+
## Possible causes and follow-ups
48+
49+
- The root cause is a recent change introduced in Azure Local version 2503. Under certain conditions, if there are transient or intermittent failures during the Kubernetes upgrade process, they're not correctly detected or recovered from. This can cause the cluster state to stay stuck in the 'Upgrading' state.
50+
- You hit this issue if the AKS Arc extension on your custom location - the `hybridaksextension` extension's version is 2.1.211 or 2.1.223. You can run the following command to check the extension version on your cluster:
51+
52+
```azurecli
53+
az login --use-device-code --tenant <Azure tenant ID>
54+
az account set -s <subscription ID>
55+
$res=get-archcimgmt
56+
az k8s-extension show -g $res.HybridaksExtension.resourceGroup -c $res.ResourceBridge.name --cluster-type appliances --name hybridaksextension
57+
```
58+
59+
## Mitigation
60+
61+
This issue can be resolved by invoking the AKS Arc update command. The `update` command retriggers the upgrade flow. You can invoke the `aksarc update` command with placeholder parameters, which do not impact the state of the cluster. So in this case, you could invoke the update call to enable NFS or SMB drivers if those features aren't already enabled. First, check if any of the storage drivers are already enabled:
62+
63+
```azurecli
64+
az login --use-device-code --tenant <Azure tenant ID>
65+
az account set -s <subscription ID>
66+
az aksarc show -g <resource_group_name> -n <cluster_name>
67+
```
68+
69+
Check the storage profile section:
70+
71+
```json
72+
"storageProfile": {
73+
"nfsCsiDriver": {
74+
"enabled": false
75+
},
76+
"smbCsiDriver": {
77+
78+
"enabled": true
79+
}
80+
}
81+
```
82+
83+
If one of the drivers is disabled, you can enable it using the following command:
84+
85+
```azurecli
86+
az aksarc update --enable-smb-driver -g <resource_group_name> -n <cluster_name>
87+
az aksarc update --enable-nfs-driver -g <resource_group_name> -n <cluster_name>
88+
```
89+
90+
Running the `aksarc update` command should resolve the issue and the `Current state` parameter of the cluster should now show as 'Succeeded'. Once the status is updated, if you don't want to retain the drivers as enabled, you can revert this action by running the following command
91+
92+
```azurecli
93+
az aksarc update --disable-smb-driver -g <resource_group_name> -n <cluster_name>
94+
az aksarc update --disable-nfs-driver -g <resource_group_name> -n <cluster_name>
95+
```
96+
97+
If both drivers are already enabled on your cluster, you can disable the one that is not in use. If you require both drivers to remain enabled, contact Microsoft Support for further assistance.
98+
99+
## Verification
100+
101+
Run the following command and check that the **Current State** parameter in the JSON output is set to 'Succeeded' to confirm the K8s version upgrade is complete.
102+
103+
```azurecli
104+
az aksarc show -g <resource_group> -n <cluster_name>
105+
106+
```
107+
108+
## Contact Microsoft Support
109+
110+
If the problem persists, collect the following information before [creating a support request](aks-troubleshoot.md#open-a-support-request). Collect [AKS cluster logs](get-on-demand-logs.md) before creating the support request.
111+
112+
## Next steps
113+
114+
- [Use the diagnostic checker tool to identify common environment issues](aks-arc-diagnostic-checker.md)
115+
- [Review AKS on Azure Local architecture](cluster-architecture.md)

0 commit comments

Comments
 (0)