|
| 1 | +--- |
| 2 | +title: Troubleshoot Azure Kubernetes Service backup |
| 3 | +description: Symptoms, causes, and resolutions of Azure Kubernetes Service backup and restore. |
| 4 | +ms.topic: troubleshooting |
| 5 | +ms.date: 03/14/2023 |
| 6 | +ms.service: backup |
| 7 | +author: jyothisuri |
| 8 | +ms.author: jsuri |
| 9 | +--- |
| 10 | + |
| 11 | +# Troubleshoot Azure Kubernetes Service backup and restore (preview) |
| 12 | + |
| 13 | +This article provides troubleshooting steps that help you resolve Azure Kubernetes Service (AKS) backup, restore, and management errors. |
| 14 | + |
| 15 | +## AKS Backup Extension installation error resolutions |
| 16 | + |
| 17 | +### Scenario 1 |
| 18 | + |
| 19 | +**Error message**: |
| 20 | + |
| 21 | + ```Erroe |
| 22 | + {Helm installation from path [] for release [azure-aks-backup] failed with the following error: err [release azure-aks-backup failed, and has been uninstalled due to atomic being set: failed post-install: timed out waiting for the condition]} occurred while doing the operation: {Installing the extension} on the config"` |
| 23 | + ``` |
| 24 | + |
| 25 | + |
| 26 | +**Cause**: The extension has been installed successfully, but the pods aren't spawning. This happens because the required compute and memory aren't available for the pods. |
| 27 | + |
| 28 | +**Resolution**: To resolve the issue, increase the number of nodes in the cluster. This allows sufficient compute and memory to be available for the pods to spawn. |
| 29 | +To scale node pool on Azure portal, follow these steps: |
| 30 | + |
| 31 | +1. On the Azure portal, open the *AKS cluster*. |
| 32 | +1. Go to **Node pools** under **Settings**. |
| 33 | +1. Select **Scale node pool**, and then update the *minimum* and *maximum* values on the **Node count range**. |
| 34 | +1. Select **Apply**. |
| 35 | + |
| 36 | +### Scenario 2 |
| 37 | + |
| 38 | +**Error message**: |
| 39 | + |
| 40 | + ```Error |
| 41 | + BackupStorageLocation "default" is unavailable: rpc error: code = Unknown desc = azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/e30af180-aa96-4d81-981a-b67570b0d615/resourceGroups/AzureBackupRG_westeurope_1/providers/Microsoft.Storage/storageAccounts/devhayyabackup/listKeys?%24expand=kerb&api-version=2019-06-01: StatusCode=404 -- Original Error: adal: Refresh request failed. Status Code = '404'. Response body: no azure identity found for request clientID 4e95##### REDACTED #####0777` |
| 42 | +
|
| 43 | + Endpoint http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=4e95dcc5-a769-4745-b2d9- |
| 44 | + ``` |
| 45 | + |
| 46 | +**Cause**: When you enable pod-managed identity on your AKS cluster, an *AzurePodIdentityException* named *aks-addon-exception* is added to the *kube-system* namespace. An *AzurePodIdentityException* allows pods with certain labels to access the Azure Instance Metadata Service (IMDS) endpoint without being intercepted by the NMI server. |
| 47 | + |
| 48 | +The extension pods aren't exempt, and require the Azure Active Directory (Azure AD) pod identity to be enabled manually. |
| 49 | + |
| 50 | +**Resolution**: Create *pod-identity* exception in AKS cluster (that works only for *dataprotection-microsoft* namespace and for *not kube-system*). [Learn more](/cli/azure/aks/pod-identity/exception?view=azure-cli-latest&preserve-view=true#az-aks-pod-identity-exception-add). |
| 51 | + |
| 52 | +1. Run the following command: |
| 53 | + |
| 54 | + |
| 55 | + ```azurepowershell-interactive |
| 56 | + az aks pod-identity exception add --resource-group shracrg --cluster-name shractestcluster --namespace dataprotection-microsoft --pod-labels app.kubernetes.io/name=dataprotection-microsoft-kubernetes |
| 57 | + ``` |
| 58 | + |
| 59 | +2. To verify *Azurepodidentityexceptions* in cluster, run the following command: |
| 60 | + |
| 61 | + ```azurepowershell-interactive |
| 62 | + kubectl get Azurepodidentityexceptions --all-namespaces |
| 63 | + ``` |
| 64 | + |
| 65 | +3. To assign the *Storage Account Contributor* role to the extension identity, run the following command: |
| 66 | + |
| 67 | + ```azurepowershell-interactive |
| 68 | + az role assignment create --assignee-object-id $(az k8s-extension show --name azure-aks-backup --cluster-name aksclustername --resource-group aksclusterresourcegroup --cluster-type managedClusters --query aksAssignedIdentity.principalId --output tsv) --role 'Storage Account Contributor' --scope /subscriptions/subscriptionid/resourceGroups/storageaccountresourcegroup/providers/Microsoft.Storage/storageAccounts/storageaccountname |
| 69 | + ``` |
| 70 | + |
| 71 | +### Scenario 3 |
| 72 | + |
| 73 | +**Error message**: |
| 74 | + |
| 75 | + ```Error |
| 76 | + {"Message":"Error in the getting the Configurations: error {Post \https://centralus.dp.kubernetesconfiguration.azure.com/subscriptions/ subscriptionid /resourceGroups/ aksclusterresourcegroup /provider/managedclusters/clusters/ aksclustername /configurations/getPendingConfigs?api-version=2021-11-01\: dial tcp: lookup centralus.dp.kubernetesconfiguration.azure.com on 10.63.136.10:53: no such host}","LogType":"ConfigAgentTrace","LogLevel":"Error","Environment":"prod","Role":"ClusterConfigAgent","Location":"centralus","ArmId":"/subscriptions/ subscriptionid /resourceGroups/ aksclusterresourcegroup /providers/Microsoft.ContainerService/managedclusters/ aksclustername ","CorrelationId":"","AgentName":"ConfigAgent","AgentVersion":"1.8.14","AgentTimestamp":"2023/01/19 20:24:16"}` |
| 77 | + ``` |
| 78 | +**Cause**: Specific FQDN/application rules are required to use cluster extensions in the AKS clusters. [Learn more](/azure/aks/limit-egress-traffic#cluster-extensions). |
| 79 | + |
| 80 | +This error appears due to absence of these FQDN rules because of which configuration information from the Cluster Extensions service wasn't available. |
| 81 | + |
| 82 | +**Resolution**: To resolve the issue, you need to create a *CoreDNS-custom override* for the *DP* endpoint to pass through the public network. |
| 83 | + |
| 84 | +1. To fetch *Existing CoreDNS-custom* YAML in your cluster (save it on your local for reference later), run the following command: |
| 85 | + |
| 86 | + ```azurepowershell-interactive |
| 87 | + kubectl get configmap coredns-custom -n kube-system -o yaml |
| 88 | + ``` |
| 89 | + |
| 90 | +2. To override mapping for *Central US DP* endpoint to public IP (download the YAML file attached), run the following command: |
| 91 | + |
| 92 | + ```azurepowershell-interactive |
| 93 | + kubectl apply -f corednsms.yaml |
| 94 | + ``` |
| 95 | + |
| 96 | +3. To force reload `coredns` pods, run the following command: |
| 97 | + |
| 98 | + ```azurepowershell-interactive |
| 99 | + kubectl delete pod --namespace kube-system -l k8s-app=kube-dns |
| 100 | + ``` |
| 101 | + |
| 102 | +4. To perform `NSlookup` from the *ExtensionAgent* pod to check if *coreDNS-custom* is working, run the following command: |
| 103 | + |
| 104 | + ```azurepowershell-interactive |
| 105 | + kubectl exec -i -t pod/extension-agent-<pod guid that's there in your cluster> -n kube-system -- nslookup centralus.dp.kubernetesconfiguration.azure.com |
| 106 | + ``` |
| 107 | + |
| 108 | +5. To check logs of the *ExtensionAgent* pod, run the following command: |
| 109 | + |
| 110 | + ```azurepowershell-interactive |
| 111 | + kubectl logs pod/extension-agent-<pod guid that’s there in your cluster> -n kube-system --tail=200 |
| 112 | + ``` |
| 113 | + |
| 114 | +6. Delete and reinstall Backup Extension to initiate backup. |
| 115 | + |
| 116 | +## Next steps |
| 117 | + |
| 118 | +- [About Azure Kubernetes Service (AKS) backup (preview)](azure-kubernetes-service-backup-overview.md) |
0 commit comments