Skip to content

Commit 1788a06

Browse files
authored
Merge pull request #296024 from rajats22/backupupdate-26022025
updates on aks backup
2 parents db7b7cc + 006f437 commit 1788a06

7 files changed

+78
-21
lines changed

articles/backup/azure-kubernetes-service-backup-troubleshoot.md

Lines changed: 32 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -167,21 +167,36 @@ These error codes appear due to issues on the Backup Extension installed in the
167167

168168
### BackupPluginPodRestartedDuringBackupError
169169

170-
**Cause**: Backup Extension Pod (dataprotection-microsoft-kubernetes-agent) in your AKS cluster experiencing instability due to insufficient CPU/Memory resources on its current node, leading to OOM (Out of Memory) kill incidents. This could be because of either lower compute requested by the backup extension pod or a large number of resources of a particular type are being backed up or restored.
170+
**Cause**: Azure Backup for AKS relies on pods deployed within the AKS cluster as part of the backup extension under the namespace `dataprotection-microsoft`. To perform backup and restore operations, these pods have specific CPU and memory requirements.
171171

172-
**Recommended action**: To address this, first check the backup logs stored in the blob container provided as input in extension installation to verify if the issue is due to large number of resources. If thats the issue then exclude these resources from the backup configuration and reattempt the operation. Otherwise we recommend increasing the compute values allocated to this pod. By doing so, it will be automatically provisioned on a different node within your AKS cluster with ample compute resources available.
172+
```
173+
1. Memory: requests - 128Mi, limits - 1280Mi
174+
2. CPU: requests - 500m, limits - 1000m
175+
```
173176

174-
The current value of compute for this pod is:
177+
However, if the number of resources in the cluster exceeds 1000, the pods may require additional CPU and memory beyond the default reservation. If the required resources exceed the allocated limits, you might encounter a BackupPluginPodRestarted error due to OOMKilled (Out of Memory) error during backup jobs.
175178

176-
resources.requests.cpu is 500m
177-
resources.requests.memory is 128Mi
178-
Kindly modify the memory allocation to 512Mi by updating the 'resources.requests.memory' parameter. If the issue persists, it is advisable to increase the 'resources.requests.cpu' parameter to 900m, post the memory allocation. You can increase the values for the parameters by following below steps:
179+
**Recommended action**: To ensure successful backup and restore operations, manually update the resource settings for the extension pods by following these steps:
179180

180-
1. Navigate to the AKS cluster blade in the Azure portal.
181-
2. Click on "Extensions+Applications" and select "azure-aks-backup" extension.
182-
3. Update the configuration settings in the portal by adding the following key-value pair.
183-
resources.requests.cpu 900m
184-
resources.requests.memory 512Mi
181+
1. Open the AKS cluster in the Azure portal.
182+
183+
![Screenshot shows AKS cluster in Azure portal.](./media/azure-kubernetes-service-cluster-manage-backups/aks-cluster.png)
184+
185+
1. Navigate to Extensions + Applications under Settings in the left-hand pane.
186+
187+
![Screenshot shows how to select Extensions + Applications.](./media/azure-kubernetes-service-cluster-manage-backups/aks-cluster-extension-applications.png)
188+
189+
1. Click on the extension titled "azure-aks-backup".
190+
191+
![Screenshot shows how to open Backup extension settings.](./media/azure-kubernetes-service-cluster-manage-backups/aks-cluster-extension-azure-aks-backup.png)
192+
193+
1. Scroll down, add new value under configuration settings and then click Save.
194+
195+
`resources.limits.memory : 4400Mi`
196+
197+
![Screenshot shows how to add values under configuration settings.](./media/azure-kubernetes-service-cluster-manage-backups/aks-cluster-extension-azure-aks-backup-configuration-update.png)
198+
199+
After applying the changes, either wait for a scheduled backup to run or initiate an on-demand backup. If you still experience an OOMKilled failure, repeat the steps above and gradually increase memory limits and if it still persists increase `resources.limits.cpu` parameter also.
185200

186201
### BackupPluginDeleteBackupOperationFailed
187202

@@ -301,23 +316,23 @@ These error codes can appear while you enable AKS backup to store backups in a v
301316

302317
**Cause**: Namespaces provided in Backup Configuration is missing while performing backups. Either the namespace was wrongly provided or has been deleted.
303318

304-
**Recommended action**: Check if the Namespaces to be backed up are correctly provided.
319+
**Recommended action**: Check if the Namespaces to be backed-up are correctly provided.
305320

306321
### UserErrorPVCHasNoVolume
307322

308323
**Error code**: UserErrorPVCHasNoVolume
309324

310-
**Cause**: The Persistent Volume Claim (PVC) in context doesn't have a Persistent Volume attached to it. So, the PVC won't be backed up.
325+
**Cause**: The Persistent Volume Claim (PVC) in context doesn't have a Persistent Volume attached to it. So, the PVC won't be backed-up.
311326

312-
**Recommended action**: Attach a volume to the PVC, if it needs to be backed up.
327+
**Recommended action**: Attach a volume to the PVC, if it needs to be backed-up.
313328

314329
### UserErrorPVCNotBoundToVolume
315330

316331
**Error code**: UserErrorPVCNotBoundToVolume
317332

318-
**Cause**: The PVC in context is in *Pending* state and doesn't have a Persistent Volume attached to it. So, the PVC won't be backed up.
333+
**Cause**: The PVC in context is in *Pending* state and doesn't have a Persistent Volume attached to it. So, the PVC won't be backed-up.
319334

320-
**Recommended action**: Attach a volume to the PVC, if it needs to be backed up.
335+
**Recommended action**: Attach a volume to the PVC, if it needs to be backed-up.
321336

322337
### UserErrorPVNotFound
323338

@@ -347,7 +362,7 @@ These error codes can appear while you enable AKS backup to store backups in a v
347362

348363
**Error code**: LinkedAuthorizationFailed
349364

350-
**Cause**: To perform a restore operation, user needs to have a **read** permission over the backed up AKS cluster.
365+
**Cause**: To perform a restore operation, user needs to have a **read** permission over the backed-up AKS cluster.
351366

352367
**Recommended action**: Assign Reader role on the source AKS cluster and then proceed to perform the restore operation.
353368

articles/backup/azure-kubernetes-service-cluster-backup-support-matrix.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ You can use [Azure Backup](./backup-overview.md) to help protect Azure Kubernete
3434

3535
- Provide a new and empty blob container as input while installing backup extension in an AKS cluster for the first time. Don't use same blob container for more than one AKS cluster.
3636

37-
- AKS backups don't support in-tree volumes. You can back up only CSI driver-based volumes. You can [migrate from tree volumes to CSI driver-based persistent volumes](/azure/aks/csi-migrate-in-tree-volumes).
37+
- AKS backups do not support in-tree volumes. You can back up only CSI driver-based volumes. You can [migrate from tree volumes to CSI driver-based persistent volumes](/azure/aks/csi-migrate-in-tree-volumes).
3838

3939
- Currently, an AKS backup supports only the backup of Azure disk-based persistent volumes (enabled by the CSI driver). The supported Azure Disk SKUs are Standard HDD, Standard SSD, and Premium SSD. The disks belonging to Premium SSD v2 and Ultra Disk SKU aren't supported. Both static and dynamically provisioned volumes are supported. For backup of static disks, the persistent volumes specification should have the *storage class* defined in the **YAML** file, otherwise such persistent volumes are skipped from the backup operation.
4040

@@ -48,6 +48,8 @@ You can use [Azure Backup](./backup-overview.md) to help protect Azure Kubernete
4848

4949
- You can't install Backup Extension in AKS Cluster with Arm64 based agent nodes irrespective of Operating System (Ubuntu/Azure Linux/Windows) running on these nodes.
5050

51+
- Azure Backup for AKS is currently not supported for Network Isolated AKS clusters.
52+
5153
- Don't install AKS Backup Extension along with Velero or other Velero-based backup services. This could lead to disruption of backup service during any future Velero upgrades driven by you or AKS backup
5254

5355
- You must install the backup extension in the AKS cluster. If you're using Azure CLI to install the backup extension, ensure that the version is 2.41 or later. Use `az upgrade` command to upgrade the Azure CLI.
@@ -81,6 +83,7 @@ You can use [Azure Backup](./backup-overview.md) to help protect Azure Kubernete
8183
| Number of allowed restores per backup instance in a day | 10 |
8284

8385
- Configuration of a storage account with private endpoint is supported.
86+
8487
- To enable Azure Backup for AKS via Terraform, its version should be >= 3.99.
8588

8689
### Other limitations for Vaulted backup and Cross Region Restore
@@ -89,7 +92,7 @@ You can use [Azure Backup](./backup-overview.md) to help protect Azure Kubernete
8992

9093
- Currently, backup instances with <= 100 disks attached as persistent volume are supported. Backup and restore operations might fail if number of disks are higher than the limit.
9194

92-
- Only Azure Disks with public access enabled from all networks are eligible to be moved to the Vault Tier; if their are disks with network access apart from public access, tiering operation will fail.
95+
- Only Azure Disks with public access enabled from all networks are eligible to be moved to the Vault Tier; if there are disks with network access apart from public access, tiering operation will fail.
9396

9497
- *Disaster Recovery* feature is only available between Azure Paired Regions (if backup is configured in a Geo Redundant Backup vault). The backup data is only available in an Azure paired region. For example, if you have an AKS cluster in East US that is backed up in a Geo Redundant Backup vault, the backup data is also available in West US for restore.
9598

articles/backup/azure-kubernetes-service-cluster-manage-backups.md

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,45 @@ Learn more about [other commands related to Trusted Access](/azure/aks/trusted-a
106106

107107
This section describes several Azure Backup supported management operations that make it easy to manage Azure Kubernetes Service cluster backups.
108108

109+
### Adjusting CPU and Memory for Azure Backup for AKS
110+
111+
Azure Backup for AKS relies on pods deployed within the AKS cluster as part of the backup extension under the namespace `dataprotection-microsoft`. To perform backup and restore operations, these pods have specific CPU and memory requirements.
112+
113+
#### Default Resource Reservations
114+
115+
```
116+
1. Memory: requests - 128Mi, limits - 1280Mi
117+
2. CPU: requests - 500m, limits - 1000m
118+
```
119+
120+
However, if the number of resources in the cluster exceeds 1000, the pods may require additional CPU and memory beyond the default reservation. If the required resources exceed the allocated limits, you might encounter a BackupPluginPodRestarted error due to OOMKilled (Out of Memory) error during backup jobs.
121+
122+
#### Resolving OOMKilled Errors by Increasing CPU and Memory
123+
124+
To ensure successful backup and restore operations, manually update the resource settings for the extension pods by following these steps:
125+
126+
1. Open the AKS cluster in the Azure portal.
127+
128+
![Screenshot shows AKS cluster in Azure portal.](./media/azure-kubernetes-service-cluster-manage-backups/aks-cluster.png)
129+
130+
1. Navigate to Extensions + Applications under Settings in the left-hand pane.
131+
132+
![Screenshot shows how to select Extensions + Applications.](./media/azure-kubernetes-service-cluster-manage-backups/aks-cluster-extension-applications.png)
133+
134+
1. Click on the extension titled "azure-aks-backup".
135+
136+
![Screenshot shows how to open Backup extension settings.](./media/azure-kubernetes-service-cluster-manage-backups/aks-cluster-extension-azure-aks-backup.png)
137+
138+
1. Scroll down, add new value under configuration settings and then click Save.
139+
140+
`resources.limits.memory : 4400Mi`
141+
142+
![Screenshot shows how to add values under configuration settings.](./media/azure-kubernetes-service-cluster-manage-backups/aks-cluster-extension-azure-aks-backup-configuration-update.png)
143+
144+
#### Verifying the Changes
145+
146+
After applying the changes, either wait for a scheduled backup to run or initiate an on-demand backup. If you still experience an OOMKilled failure, repeat the steps above and gradually increase memory limits and if it still persists increase `resources.limits.cpu` parameter also.
147+
109148
### Monitor a backup operation
110149

111150
The Azure Backup service creates a job for scheduled backups or if you trigger on-demand backup operation for tracking. To view the backup job status:
@@ -155,7 +194,7 @@ For AKS backup, backup and restore jobs can show the status **Completed with War
155194

156195
:::image type="content" source="./media/azure-kubernetes-service-cluster-manage-backups/backup-restore-jobs-completed-with-warnings.png" alt-text="Screenshot shows the backup and restore jobs completed with warnings." lightbox="./media/azure-kubernetes-service-cluster-manage-backups/backup-restore-jobs-completed-with-warnings.png":::
157196

158-
For example, if a backup job for an AKS cluster completes with the status **Completed with Warnings**, a restore point is created, but it does not have all the resources in the cluster backed up as per the backup configuration. The job shows warning details, providing the *issues* and *resources* that were impacted during the operation.
197+
For example, if a backup job for an AKS cluster completes with the status **Completed with Warnings**, a restore point is created, but it does not have all the resources in the cluster backed-up as per the backup configuration. The job shows warning details, providing the *issues* and *resources* that were impacted during the operation.
159198

160199
To view these warnings, select **View Details** next to **Warning Details**.
161200

@@ -186,7 +225,7 @@ You can change the associated policy with a backup instance.
186225

187226
There are three ways by which you can stop protecting an Azure Disk:
188227

189-
- **Stop Protection and Retain Data (Retain forever)**: This option helps you stop all future backup jobs from protecting your cluster. However, Azure Backup service retains the recovery points that are backed up forever. You need to pay to keep the recovery points in the vault (see [Azure Backup pricing](https://azure.microsoft.com/pricing/details/backup/) for details). You are able to restore the disk, if needed. To resume cluster protection, use the **Resume backup** option.
228+
- **Stop Protection and Retain Data (Retain forever)**: This option helps you stop all future backup jobs from protecting your cluster. However, Azure Backup service retains the recovery points that are backed-up forever. You need to pay to keep the recovery points in the vault (see [Azure Backup pricing](https://azure.microsoft.com/pricing/details/backup/) for details). You are able to restore the disk, if needed. To resume cluster protection, use the **Resume backup** option.
190229

191230
- **Stop Protection and Retain Data (Retain as per Policy)**: This option helps you stop all future backup jobs from protecting your cluster. The recovery points are retained as per policy and will be chargeable according to [Azure Backup pricing](https://azure.microsoft.com/pricing/details/backup/). However, the latest recovery point is retained forever.
192231

107 KB
Loading
Loading
86.5 KB
Loading
155 KB
Loading

0 commit comments

Comments
 (0)