|
| 1 | +--- |
| 2 | +title: Troubleshoot the issue where the cluster is stuck in Upgrading state |
| 3 | +description: Learn how to troubleshoot and mitigate the issue when an AKS enabled by Arc cluster is stuck in 'Upgrading' state. |
| 4 | +ms.topic: troubleshooting |
| 5 | +author: rcheeran |
| 6 | +ms.author: rcheeran |
| 7 | +ms.date: 06/25/2025 |
| 8 | +ms.reviewer: abha |
| 9 | + |
| 10 | +--- |
| 11 | + |
| 12 | +# Troubleshoot the issue when the AKS Arc cluster is stuck in 'Upgrading' state |
| 13 | + |
| 14 | +This article describes how to fix the issue when your AKS Arc cluster is stuck in 'Upgrading' state, when you try to upgrade the Kubernetes version on your cluster. This issue typically occurs after updating ASZ Local to version 2503 or 2504. |
| 15 | + |
| 16 | +## Symptoms |
| 17 | + |
| 18 | +When you try to upgrade an AKS Arc cluster, you notice that the **Current state** property of the cluster continues to show as 'Upgrading', as shown below: |
| 19 | + |
| 20 | +```output |
| 21 | +az aksarc upgrade --name "cluster-name" --resource-group "rg-name" |
| 22 | +
|
| 23 | +===> Kubernetes may be unavailable during cluster upgrades. |
| 24 | + Are you sure you want to perform this operation? (y/N): y |
| 25 | +The cluster is on version 1.28.9 and is not in a failed state. |
| 26 | +
|
| 27 | +===> This will upgrade the control plane AND all nodepools to version 1.30.4. Continue? (y/N): y |
| 28 | +Upgrading the AKSArc cluster. This operation might take a while... |
| 29 | +{ |
| 30 | + "extendedLocation": { |
| 31 | + "name": "/subscriptions/resourceGroups/Bellevue/providers/Microsoft.ExtendedLocation/customLocations/bel-CL", |
| 32 | + "type": "CustomLocation" |
| 33 | + }, |
| 34 | + "id": "/subscriptions/fbaf508b-cb61-4383-9cda-a42bfa0c7bc9/resourceGroups/Bellevue/providers/Microsoft.Kubernetes/ConnectedClusters/Bel-cluster/providers/Microsoft.HybridContainerService/ProvisionedClusterInstances/default", |
| 35 | + "name": "default", |
| 36 | + "properties": { |
| 37 | + "kubernetesVersion": "1.30.4", |
| 38 | + "provisioningState": "Succeeded", |
| 39 | + "currentState": "Upgrading", |
| 40 | + "errorMessage": null, |
| 41 | + "operationStatus": null |
| 42 | + "agentPoolProfiles": [ |
| 43 | + { |
| 44 | + ... |
| 45 | +``` |
| 46 | + |
| 47 | + |
| 48 | +## Possible causes and follow-ups |
| 49 | + |
| 50 | +- The root cause is a recent change introduced in Azure Local version 2503. Under certain conditions, transient or intermittent failures during the Kubernetes upgrade process are not correctly detected or recovered from, leading the cluster state to remain indefinitely in the 'Upgrading' state. |
| 51 | +- You will hit this issue if the version of the AKS Arc extension on your custom location - the `hybridaksextension` extension's version is 2.1.211 or 2.1.223. You can run the following command to check the extension version on your cluster: |
| 52 | + |
| 53 | +```azurecli |
| 54 | +az login --use-device-code --tenant <Azure tenant ID> |
| 55 | +az account set -s <subscription ID> |
| 56 | +$res=get-archcimgmt |
| 57 | +az k8s-extension show -g $res.HybridaksExtension.resourceGroup -c $res.ResourceBridge.name --cluster-type appliances --name hybridaksextension |
| 58 | +``` |
| 59 | + |
| 60 | + |
| 61 | +## Mitigation |
| 62 | +This issue can be resolved by invoking the AKS Arc update call. This will retrigger the upgrade flow as well. You can invoke the `aksarc update` command with some placeholder parameters. So in this case, you could invoke the update call to enable NFS or SMB drivers if those features are not already enabled. First, check whether any of the features enabled |
| 63 | + |
| 64 | +```azurecli |
| 65 | +az login --use-device-code --tenant <Azure tenant ID> |
| 66 | +az account set -s <subscription ID> |
| 67 | +az aksarc show -g <resource_group_name> -n <cluster_name> |
| 68 | +``` |
| 69 | +Check the storage profile setion: |
| 70 | +```json |
| 71 | +"storageProfile": { |
| 72 | + "nfsCsiDriver": { |
| 73 | + "enabled": false |
| 74 | + }, |
| 75 | + "smbCsiDriver": { |
| 76 | + "enabled": true |
| 77 | + } |
| 78 | + } |
| 79 | +``` |
| 80 | + |
| 81 | +If one of the drivers are disabled, you can enable it using the following command |
| 82 | + |
| 83 | +```azurecli |
| 84 | +az aksarc update --enable-smb-driver -g <resource_group_name> -n <cluster_name> |
| 85 | +az aksarc update --enable-nfs-driver -g <resource_group_name> -n <cluster_name> |
| 86 | +``` |
| 87 | + |
| 88 | +Running the `aksarc update` command should resolve the issue and the `Current state` parameter of the cluster should now show as 'Succeeded'. Once the status is updated, if you don't want to retain the drivers as enabled, you can revert this action by running the following command |
| 89 | + |
| 90 | +```azurecli |
| 91 | +az aksarc update --disable-smb-driver -g <resource_group_name> -n <cluster_name> |
| 92 | +az aksarc update --disable-nfs-driver -g <resource_group_name> -n <cluster_name> |
| 93 | +``` |
| 94 | +If you find that both of the drivers are enabled on your cluster, you can disable the driver you are not using. If you are using both drivers, please contact the support team for further instructions. |
| 95 | + |
| 96 | +## Verification |
| 97 | +You can check that the K8s version upgrade has completed, and state has moved to Succeeded, by running the following command and checking for the **Current State** parameter in the JSON. |
| 98 | + |
| 99 | +```azurecli |
| 100 | +az aksarc show -g <resource_group> -n <cluster_name> |
| 101 | +
|
| 102 | +``` |
| 103 | + |
| 104 | + |
| 105 | +## Contact Microsoft Support |
| 106 | + |
| 107 | +If the problem persists, collect the following information before [creating a support request](aks-troubleshoot.md#open-a-support-request). Collect [AKS cluster logs](get-on-demand-logs.md) before creating the support request. |
| 108 | + |
| 109 | +## Next steps |
| 110 | + |
| 111 | +- [Use the diagnostic checker tool to identify common environment issues](aks-arc-diagnostic-checker.md) |
| 112 | +- [Review AKS on Azure Local architecture](cluster-architecture.md) |
0 commit comments