|
| 1 | +--- |
| 2 | +title: Troubleshoot the issue where the cluster is stuck in Upgrading state |
| 3 | +description: Learn how to troubleshoot and mitigate the issue when an AKS enabled by Arc cluster is stuck in 'Upgrading' state. |
| 4 | +ms.topic: troubleshooting |
| 5 | +author: rcheeran |
| 6 | +ms.author: rcheeran |
| 7 | +ms.date: 06/25/2025 |
| 8 | +ms.reviewer: abha |
| 9 | + |
| 10 | +--- |
| 11 | + |
| 12 | +# Troubleshoot the issue when the AKS Arc cluster is stuck in 'Upgrading' state |
| 13 | + |
| 14 | +This article describes how to fix the issue when your Azure Kubernetes Service enabled by Arc (AKS Arc) cluster is stuck in 'Upgrading' state. This issue typically occurs after updating Azure Local to version 2503 or 2504 and when you try to upgrade the Kubernetes version on your cluster. |
| 15 | + |
| 16 | +## Symptoms |
| 17 | + |
| 18 | +When you try to upgrade an AKS Arc cluster, you notice that the **Current state** property of the cluster remains in the 'Upgrading' state. |
| 19 | + |
| 20 | +```output |
| 21 | +az aksarc upgrade --name "cluster-name" --resource-group "rg-name" |
| 22 | +
|
| 23 | +===> Kubernetes may be unavailable during cluster upgrades. |
| 24 | + Are you sure you want to perform this operation? (y/N): y |
| 25 | +The cluster is on version 1.28.9 and is not in a failed state. |
| 26 | +
|
| 27 | +===> This will upgrade the control plane AND all nodepools to version 1.30.4. Continue? (y/N): y |
| 28 | +Upgrading the AKSArc cluster. This operation might take a while... |
| 29 | +{ |
| 30 | + "extendedLocation": { |
| 31 | + "name": "/subscriptions/resourceGroups/Bellevue/providers/Microsoft.ExtendedLocation/customLocations/bel-CL", |
| 32 | + "type": "CustomLocation" |
| 33 | + }, |
| 34 | + "id": "/subscriptions/fbaf508b-cb61-4383-9cda-a42bfa0c7bc9/resourceGroups/Bellevue/providers/Microsoft.Kubernetes/ConnectedClusters/Bel-cluster/providers/Microsoft.HybridContainerService/ProvisionedClusterInstances/default", |
| 35 | + "name": "default", |
| 36 | + "properties": { |
| 37 | + "kubernetesVersion": "1.30.4", |
| 38 | + "provisioningState": "Succeeded", |
| 39 | + "currentState": "Upgrading", |
| 40 | + "errorMessage": null, |
| 41 | + "operationStatus": null |
| 42 | + "agentPoolProfiles": [ |
| 43 | + { |
| 44 | + ... |
| 45 | +``` |
| 46 | + |
| 47 | +## Possible causes and follow-ups |
| 48 | + |
| 49 | +- The root cause is a recent change introduced in Azure Local version 2503. Under certain conditions, if there are transient or intermittent failures during the Kubernetes upgrade process, they're not correctly detected or recovered from. This can cause the cluster state to stay stuck in the 'Upgrading' state. |
| 50 | +- You hit this issue if the AKS Arc extension on your custom location - the `hybridaksextension` extension's version is 2.1.211 or 2.1.223. You can run the following command to check the extension version on your cluster: |
| 51 | + |
| 52 | +```azurecli |
| 53 | +az login --use-device-code --tenant <Azure tenant ID> |
| 54 | +az account set -s <subscription ID> |
| 55 | +$res=get-archcimgmt |
| 56 | +az k8s-extension show -g $res.HybridaksExtension.resourceGroup -c $res.ResourceBridge.name --cluster-type appliances --name hybridaksextension |
| 57 | +``` |
| 58 | + |
| 59 | +## Mitigation |
| 60 | + |
| 61 | +This issue can be resolved by invoking the AKS Arc update command. The `update` command retriggers the upgrade flow. You can invoke the `aksarc update` command with placeholder parameters, which do not impact the state of the cluster. So in this case, you could invoke the update call to enable NFS or SMB drivers if those features aren't already enabled. First, check if any of the storage drivers are already enabled: |
| 62 | + |
| 63 | +```azurecli |
| 64 | +az login --use-device-code --tenant <Azure tenant ID> |
| 65 | +az account set -s <subscription ID> |
| 66 | +az aksarc show -g <resource_group_name> -n <cluster_name> |
| 67 | +``` |
| 68 | + |
| 69 | +Check the storage profile section: |
| 70 | + |
| 71 | +```json |
| 72 | +"storageProfile": { |
| 73 | + "nfsCsiDriver": { |
| 74 | + "enabled": false |
| 75 | + }, |
| 76 | + "smbCsiDriver": { |
| 77 | + |
| 78 | + "enabled": true |
| 79 | + } |
| 80 | + } |
| 81 | +``` |
| 82 | + |
| 83 | +If one of the drivers is disabled, you can enable it using the following command: |
| 84 | + |
| 85 | +```azurecli |
| 86 | +az aksarc update --enable-smb-driver -g <resource_group_name> -n <cluster_name> |
| 87 | +az aksarc update --enable-nfs-driver -g <resource_group_name> -n <cluster_name> |
| 88 | +``` |
| 89 | + |
| 90 | +Running the `aksarc update` command should resolve the issue and the `Current state` parameter of the cluster should now show as 'Succeeded'. Once the status is updated, if you don't want to retain the drivers as enabled, you can revert this action by running the following command |
| 91 | + |
| 92 | +```azurecli |
| 93 | +az aksarc update --disable-smb-driver -g <resource_group_name> -n <cluster_name> |
| 94 | +az aksarc update --disable-nfs-driver -g <resource_group_name> -n <cluster_name> |
| 95 | +``` |
| 96 | + |
| 97 | +If both drivers are already enabled on your cluster, you can disable the one that is not in use. If you require both drivers to remain enabled, contact Microsoft Support for further assistance. |
| 98 | + |
| 99 | +## Verification |
| 100 | + |
| 101 | +Run the following command and check that the **Current State** parameter in the JSON output is set to 'Succeeded' to confirm the K8s version upgrade is complete. |
| 102 | + |
| 103 | +```azurecli |
| 104 | +az aksarc show -g <resource_group> -n <cluster_name> |
| 105 | +
|
| 106 | +``` |
| 107 | + |
| 108 | +## Contact Microsoft Support |
| 109 | + |
| 110 | +If the problem persists, collect the following information before [creating a support request](aks-troubleshoot.md#open-a-support-request). Collect [AKS cluster logs](get-on-demand-logs.md) before creating the support request. |
| 111 | + |
| 112 | +## Next steps |
| 113 | + |
| 114 | +- [Use the diagnostic checker tool to identify common environment issues](aks-arc-diagnostic-checker.md) |
| 115 | +- [Review AKS on Azure Local architecture](cluster-architecture.md) |
0 commit comments