Skip to content

Commit 48f8002

Browse files
authored
Merge pull request #126195 from abarqawi/patch-7
Update azure-kubernetes-service-backup-troubleshoot.md
2 parents 4f3c6b7 + 26f5667 commit 48f8002

File tree

1 file changed

+35
-0
lines changed

1 file changed

+35
-0
lines changed

articles/backup/azure-kubernetes-service-backup-troubleshoot.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,41 @@ This error appears due to absence of these FQDN rules because of which configura
129129

130130
6. Delete and reinstall Backup Extension to initiate backup.
131131

132+
### Scenario 4
133+
134+
**Error message**:
135+
136+
```Error
137+
"message": "Error: [ InnerError: [Helm installation failed : Unable to create/update Kubernetes resources for the extension : Recommendation Please check that there are no policies blocking the resource creation/update for the extension : InnerError [release azure-aks-backup failed, and has been uninstalled due to atomic being set: failed pre-install: job failed: BackoffLimitExceeded]]] occurred while doing the operation : [Create] on the config, For general troubleshooting visit: https://aka.ms/k8s-extensions-TSG, For more application specific troubleshooting visit: Facing trouble? Common errors and potential fixes are detailed in the Kubernetes Backup Troubleshooting Guide, available at https://www.aka.ms/aksclusterbackup",
138+
```
139+
The upgrade CRDs pre-install job is failing in the cluster.
140+
141+
**Cause**: Pods Unable to Communicate with Kube API Server
142+
143+
**Debug**
144+
145+
1. Check for any events in the cluster related to pod spawn issue.
146+
```azurecli-interactive
147+
kubectl events -n dataprotection-microsoft
148+
```
149+
2. Check the pods for dataprotection crds.
150+
```azurecli-interactive
151+
kubectl get pods -A | grep "dataprotection-microsoft-kubernetes-agent-upgrade-crds"
152+
```
153+
3. Check the pods logs.
154+
```azurecli-interactive
155+
kubectl logs -f --all-containers=true --timestamps=true -n dataprotection-microsoft <pod-name-from-prev-command>
156+
```
157+
Example log message:
158+
```Error
159+
2024-08-09T06:21:37.712646207Z Unable to connect to the server: dial tcp: lookup aks-test.hcp.westeurope.azmk8s.io: i/o timeout
160+
2024-10-01T11:26:17.498523756Z Unable to connect to the server: dial tcp 10.146.34.10:443: i/o timeout
161+
```
162+
**Resolution**:
163+
In this case, there is a Network/Calico policy or NSG that didn't allow dataprotection-microsoft pods to communicate with the API server.
164+
You should allow the dataprotection-microsoft namespace, and then reinstall the extension.
165+
166+
132167
## Backup Extension post installation related errors
133168

134169
These error codes appear due to issues on the Backup Extension installed in the AKS cluster.

0 commit comments

Comments
 (0)