Skip to content

Commit d878aa8

Browse files
authored
Merge pull request #267535 from AbhishekMallick-MS/main
AKS backup - monitor warning and troubleshoot
2 parents 7270639 + 7d8eb18 commit d878aa8

File tree

4 files changed

+100
-2
lines changed

4 files changed

+100
-2
lines changed

articles/backup/azure-kubernetes-service-backup-troubleshoot.md

Lines changed: 83 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: Troubleshoot Azure Kubernetes Service backup
33
description: Symptoms, causes, and resolutions of the Azure Kubernetes Service backup and restore operations.
44
ms.topic: troubleshooting
5-
ms.date: 12/28/2023
5+
ms.date: 02/28/2024
66
ms.service: backup
77
ms.custom:
88
- ignite-2023
@@ -222,6 +222,88 @@ This error code can appear while you enable AKS backup to store backups in a vau
222222

223223
3. Create a backup policy for operational tier backup (only snapshots for the AKS cluster).
224224

225+
## AKS backup and restore jobs completed with warnings
226+
227+
### UserErrorPVSnapshotDisallowedByPolicy
228+
229+
**Error code**: UserErrorPVSnapshotDisallowedByPolicy
230+
231+
**Cause**: An Azure policy is assigned over subscription that ceases the CSI driver to take the volume snapshot.
232+
233+
**Recommended action**: Remove the Azure Policy ceasing the disk snapshot operation, and then perform an on-demand backup.
234+
235+
### UserErrorPVSnapshotLimitReached
236+
237+
**Error code**: UserErrorPVSnapshotLimitReached
238+
239+
**Cause**: There is a limited number of snapshots for a Persistent Volume that can exist at a point-in-time. For Azure Disk-based Persistent Volumes, the limit is *500 snapshots*. This error appears when snapshots for specific Persistent Volumes aren't taken due to existence of snapshots higher than the supported limits.
240+
241+
**Recommended action**: Update the Backup Policy to reduce the retention duration and wait for older recovery points to be deleted by the Backup vault.
242+
243+
### CSISnapshottingTimedOut
244+
245+
**Error code**: CSISnapshottingTimedOut
246+
247+
**Cause**: Snapshot has failed because CSI Driver is getting timed out to fetch the snapshot handle.
248+
249+
**Recommended action**: Review the logs and retry the operation to get successful snapshots by running an on-demand backup, or wait for next scheduled backup.
250+
251+
### UserErrorHookExecutionFailed
252+
253+
**Error code**: UserErrorHookExecutionFailed
254+
255+
**Cause**: When hooks applied to run along with backups and restores have encountered an error, and aren't successfully applied.
256+
257+
**Recommended action**: Review the logs, update the hooks, and then retry backup/restore operation.
258+
259+
### UserErrorNamespaceNotFound
260+
261+
**Error code**: UserErrorNamespaceNotFound
262+
263+
**Cause**: Namespaces provided in Backup Configuration is missing while performing backups. Either the namespace was wrongly provided or has been deleted.
264+
265+
**Recommended action**: Check if the Namespaces to be backed up are correctly provided.
266+
267+
### UserErrorPVCHasNoVolume
268+
269+
**Error code**: UserErrorPVCHasNoVolume
270+
271+
**Cause**: The Persistent Volume Claim (PVC) in context does not have a Persistent Volume attached to it. So, the PVC will not be backed up.
272+
273+
**Recommended action**: Attach a volume to the PVC, if it needs to be backed up.
274+
275+
### UserErrorPVCNotBoundToVolume
276+
277+
**Error code**: UserErrorPVCNotBoundToVolume
278+
279+
**Cause**: The PVC in context is in *Pending* state and doesn't have a Persistent Volume attached to it. So, the PVC will not be backed up.
280+
281+
**Recommended action**: Attach a volume to the PVC, if it needs to be backed up.
282+
283+
### UserErrorPVNotFound
284+
285+
**Error code**: UserErrorPVNotFound
286+
287+
**Cause**: The underlying storage medium for the Persistent Volume is missing.
288+
289+
**Recommended action**: Check and attached a new Persistent Volume with actual storage medium attached.
290+
291+
### UserErrorStorageClassMissingForPVC
292+
293+
**Error code**: UserErrorStorageClassMissingForPVC
294+
295+
**Cause**: AKS backup checks for the storage class being used and skips the Persistent Volume from taking snapshots due to unavailability of the class.
296+
297+
**Recommended action**: Update the PVC specifications with the storage class used.
298+
299+
### UserErrorSourceandTargetClusterCRDVersionMismatch
300+
301+
**Error code**: UserErrorSourceandTargetClusterCRDVersionMismatch
302+
303+
**Cause**: The source AKS cluster and Target AKS cluster during restore have different versions of *FlowSchema* and *PriorityLevelConfigurations CRs*. Some Kubernetes resources aren't restored due to the mismatch in cluster versions.
304+
305+
**Recommended action**: Use same cluster version for Target cluster as Source cluster or manually apply the CRs.
306+
225307
## Next steps
226308

227309
- [About Azure Kubernetes Service (AKS) backup](azure-kubernetes-service-backup-overview.md)

articles/backup/azure-kubernetes-service-cluster-manage-backups.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.service: backup
66
ms.custom:
77
- devx-track-azurecli
88
- ignite-2023
9-
ms.date: 02/27/2024
9+
ms.date: 02/28/2024
1010
author: AbhishekMallick-MS
1111
ms.author: v-abhmallick
1212
---
@@ -139,6 +139,22 @@ To enable Trusted Access between Backup vault and AKS cluster, use the following
139139

140140
Learn more about [other commands related to Trusted Access](../aks/trusted-access-feature.md#trusted-access-feature-overview).
141141

142+
## Monitor AKS backup jobs completed with warnings
143+
144+
When a scheduled or an on-demand backup or restore operation is performed, a job is created corresponding to the operation to track its progress. In case of a failure, these jobs allow you to identify error codes and fix issues to run a successful job later.
145+
146+
For AKS backup, backup and restore jobs can show the status **Completed with Warnings**. This status appears when the backup and restore operation isn't fully successful due to issues in user-defined configurations or internal state of the workload.
147+
148+
:::image type="content" source="./media/azure-kubernetes-service-cluster-manage-backups/backup-restore-jobs-completed-with-warnings.png" alt-text="Screenshot shows the backup and restore jobs completed with warnings." lightbox="./media/azure-kubernetes-service-cluster-manage-backups/backup-restore-jobs-completed-with-warnings.png":::
149+
150+
For example, if a backup job for an AKS cluster completes with the status **Completed with Warnings**, a restore point will be created, but it might not have been able to back up all the resources in the cluster as per the backup configuration. The job will show warning details, providing the *issues* and *resources* that were impacted during the operation.
151+
152+
To view these warnings, select **View Details** next to **Warning Details**.
153+
154+
:::image type="content" source="./media/azure-kubernetes-service-cluster-manage-backups/example-backup-job-with-warning-details.png" alt-text="Screenshot shows the job warming details." lightbox="./media/azure-kubernetes-service-cluster-manage-backups/example-backup-job-with-warning-details.png":::
155+
156+
Learn [how to identify and resolve the error](azure-kubernetes-service-backup-troubleshoot.md#aks-backup-extension-installation-error-resolutions).
157+
142158
## Next steps
143159

144160
- [Back up Azure Kubernetes Service cluster](azure-kubernetes-service-cluster-backup.md)
Loading
Loading

0 commit comments

Comments
 (0)