You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/backup/azure-kubernetes-service-backup-troubleshoot.md
+32-17Lines changed: 32 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -167,21 +167,36 @@ These error codes appear due to issues on the Backup Extension installed in the
167
167
168
168
### BackupPluginPodRestartedDuringBackupError
169
169
170
-
**Cause**: Backup Extension Pod (dataprotection-microsoft-kubernetes-agent) in your AKS cluster experiencing instability due to insufficient CPU/Memory resources on its current node, leading to OOM (Out of Memory) kill incidents. This could be because of either lower compute requested by the backup extension pod or a large number of resources of a particular type are being backed up or restored.
170
+
**Cause**: Azure Backup for AKS relies on pods deployed within the AKS cluster as part of the backup extension under the namespace `dataprotection-microsoft`. To perform backup and restore operations, these pods have specific CPU and memory requirements.
171
171
172
-
**Recommended action**: To address this, first check the backup logs stored in the blob container provided as input in extension installation to verify if the issue is due to large number of resources. If thats the issue then exclude these resources from the backup configuration and reattempt the operation. Otherwise we recommend increasing the compute values allocated to this pod. By doing so, it will be automatically provisioned on a different node within your AKS cluster with ample compute resources available.
172
+
```
173
+
1. Memory: requests - 128Mi, limits - 1280Mi
174
+
2. CPU: requests - 500m, limits - 1000m
175
+
```
173
176
174
-
The current value of compute for this pod is:
177
+
However, if the number of resources in the cluster exceeds 1000, the pods may require additional CPU and memory beyond the default reservation. If the required resources exceed the allocated limits, you might encounter a BackupPluginPodRestarted error due to OOMKilled (Out of Memory) error during backup jobs.
175
178
176
-
resources.requests.cpu is 500m
177
-
resources.requests.memory is 128Mi
178
-
Kindly modify the memory allocation to 512Mi by updating the 'resources.requests.memory' parameter. If the issue persists, it is advisable to increase the 'resources.requests.cpu' parameter to 900m, post the memory allocation. You can increase the values for the parameters by following below steps:
179
+
**Recommended action**: To ensure successful backup and restore operations, manually update the resource settings for the extension pods by following these steps:
179
180
180
-
1. Navigate to the AKS cluster blade in the Azure portal.
181
-
2. Click on "Extensions+Applications" and select "azure-aks-backup" extension.
182
-
3. Update the configuration settings in the portal by adding the following key-value pair.
183
-
resources.requests.cpu 900m
184
-
resources.requests.memory 512Mi
181
+
1. Open the AKS cluster in the Azure portal.
182
+
183
+

184
+
185
+
1. Navigate to Extensions + Applications under Settings in the left-hand pane.
186
+
187
+

188
+
189
+
1. Click on the extension titled "azure-aks-backup".
190
+
191
+

192
+
193
+
1. Scroll down, add new value under configuration settings and then click Save.
194
+
195
+
`resources.limits.memory : 4400Mi`
196
+
197
+

198
+
199
+
After applying the changes, either wait for a scheduled backup to run or initiate an on-demand backup. If you still experience an OOMKilled failure, repeat the steps above and gradually increase memory limits and if it still persists increase `resources.limits.cpu` parameter also.
185
200
186
201
### BackupPluginDeleteBackupOperationFailed
187
202
@@ -301,23 +316,23 @@ These error codes can appear while you enable AKS backup to store backups in a v
301
316
302
317
**Cause**: Namespaces provided in Backup Configuration is missing while performing backups. Either the namespace was wrongly provided or has been deleted.
303
318
304
-
**Recommended action**: Check if the Namespaces to be backedup are correctly provided.
319
+
**Recommended action**: Check if the Namespaces to be backed-up are correctly provided.
305
320
306
321
### UserErrorPVCHasNoVolume
307
322
308
323
**Error code**: UserErrorPVCHasNoVolume
309
324
310
-
**Cause**: The Persistent Volume Claim (PVC) in context doesn't have a Persistent Volume attached to it. So, the PVC won't be backedup.
325
+
**Cause**: The Persistent Volume Claim (PVC) in context doesn't have a Persistent Volume attached to it. So, the PVC won't be backed-up.
311
326
312
-
**Recommended action**: Attach a volume to the PVC, if it needs to be backedup.
327
+
**Recommended action**: Attach a volume to the PVC, if it needs to be backed-up.
313
328
314
329
### UserErrorPVCNotBoundToVolume
315
330
316
331
**Error code**: UserErrorPVCNotBoundToVolume
317
332
318
-
**Cause**: The PVC in context is in *Pending* state and doesn't have a Persistent Volume attached to it. So, the PVC won't be backedup.
333
+
**Cause**: The PVC in context is in *Pending* state and doesn't have a Persistent Volume attached to it. So, the PVC won't be backed-up.
319
334
320
-
**Recommended action**: Attach a volume to the PVC, if it needs to be backedup.
335
+
**Recommended action**: Attach a volume to the PVC, if it needs to be backed-up.
321
336
322
337
### UserErrorPVNotFound
323
338
@@ -347,7 +362,7 @@ These error codes can appear while you enable AKS backup to store backups in a v
347
362
348
363
**Error code**: LinkedAuthorizationFailed
349
364
350
-
**Cause**: To perform a restore operation, user needs to have a **read** permission over the backedup AKS cluster.
365
+
**Cause**: To perform a restore operation, user needs to have a **read** permission over the backed-up AKS cluster.
351
366
352
367
**Recommended action**: Assign Reader role on the source AKS cluster and then proceed to perform the restore operation.
Copy file name to clipboardExpand all lines: articles/backup/azure-kubernetes-service-cluster-backup-support-matrix.md
+5-2Lines changed: 5 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,7 +34,7 @@ You can use [Azure Backup](./backup-overview.md) to help protect Azure Kubernete
34
34
35
35
- Provide a new and empty blob container as input while installing backup extension in an AKS cluster for the first time. Don't use same blob container for more than one AKS cluster.
36
36
37
-
- AKS backups don't support in-tree volumes. You can back up only CSI driver-based volumes. You can [migrate from tree volumes to CSI driver-based persistent volumes](/azure/aks/csi-migrate-in-tree-volumes).
37
+
- AKS backups do not support in-tree volumes. You can back up only CSI driver-based volumes. You can [migrate from tree volumes to CSI driver-based persistent volumes](/azure/aks/csi-migrate-in-tree-volumes).
38
38
39
39
- Currently, an AKS backup supports only the backup of Azure disk-based persistent volumes (enabled by the CSI driver). The supported Azure Disk SKUs are Standard HDD, Standard SSD, and Premium SSD. The disks belonging to Premium SSD v2 and Ultra Disk SKU aren't supported. Both static and dynamically provisioned volumes are supported. For backup of static disks, the persistent volumes specification should have the *storage class* defined in the **YAML** file, otherwise such persistent volumes are skipped from the backup operation.
40
40
@@ -48,6 +48,8 @@ You can use [Azure Backup](./backup-overview.md) to help protect Azure Kubernete
48
48
49
49
- You can't install Backup Extension in AKS Cluster with Arm64 based agent nodes irrespective of Operating System (Ubuntu/Azure Linux/Windows) running on these nodes.
50
50
51
+
- Azure Backup for AKS is currently not supported for Network Isolated AKS clusters.
52
+
51
53
- Don't install AKS Backup Extension along with Velero or other Velero-based backup services. This could lead to disruption of backup service during any future Velero upgrades driven by you or AKS backup
52
54
53
55
- You must install the backup extension in the AKS cluster. If you're using Azure CLI to install the backup extension, ensure that the version is 2.41 or later. Use `az upgrade` command to upgrade the Azure CLI.
@@ -81,6 +83,7 @@ You can use [Azure Backup](./backup-overview.md) to help protect Azure Kubernete
81
83
| Number of allowed restores per backup instance in a day | 10 |
82
84
83
85
- Configuration of a storage account with private endpoint is supported.
86
+
84
87
- To enable Azure Backup for AKS via Terraform, its version should be >= 3.99.
85
88
86
89
### Other limitations for Vaulted backup and Cross Region Restore
@@ -89,7 +92,7 @@ You can use [Azure Backup](./backup-overview.md) to help protect Azure Kubernete
89
92
90
93
- Currently, backup instances with <= 100 disks attached as persistent volume are supported. Backup and restore operations might fail if number of disks are higher than the limit.
91
94
92
-
- Only Azure Disks with public access enabled from all networks are eligible to be moved to the Vault Tier; if their are disks with network access apart from public access, tiering operation will fail.
95
+
- Only Azure Disks with public access enabled from all networks are eligible to be moved to the Vault Tier; if there are disks with network access apart from public access, tiering operation will fail.
93
96
94
97
-*Disaster Recovery* feature is only available between Azure Paired Regions (if backup is configured in a Geo Redundant Backup vault). The backup data is only available in an Azure paired region. For example, if you have an AKS cluster in East US that is backed up in a Geo Redundant Backup vault, the backup data is also available in West US for restore.
Copy file name to clipboardExpand all lines: articles/backup/azure-kubernetes-service-cluster-manage-backups.md
+41-2Lines changed: 41 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -106,6 +106,45 @@ Learn more about [other commands related to Trusted Access](/azure/aks/trusted-a
106
106
107
107
This section describes several Azure Backup supported management operations that make it easy to manage Azure Kubernetes Service cluster backups.
108
108
109
+
### Adjusting CPU and Memory for Azure Backup for AKS
110
+
111
+
Azure Backup for AKS relies on pods deployed within the AKS cluster as part of the backup extension under the namespace `dataprotection-microsoft`. To perform backup and restore operations, these pods have specific CPU and memory requirements.
112
+
113
+
#### Default Resource Reservations
114
+
115
+
```
116
+
1. Memory: requests - 128Mi, limits - 1280Mi
117
+
2. CPU: requests - 500m, limits - 1000m
118
+
```
119
+
120
+
However, if the number of resources in the cluster exceeds 1000, the pods may require additional CPU and memory beyond the default reservation. If the required resources exceed the allocated limits, you might encounter a BackupPluginPodRestarted error due to OOMKilled (Out of Memory) error during backup jobs.
121
+
122
+
#### Resolving OOMKilled Errors by Increasing CPU and Memory
123
+
124
+
To ensure successful backup and restore operations, manually update the resource settings for the extension pods by following these steps:
125
+
126
+
1. Open the AKS cluster in the Azure portal.
127
+
128
+

129
+
130
+
1. Navigate to Extensions + Applications under Settings in the left-hand pane.
131
+
132
+

133
+
134
+
1. Click on the extension titled "azure-aks-backup".
135
+
136
+

137
+
138
+
1. Scroll down, add new value under configuration settings and then click Save.
139
+
140
+
`resources.limits.memory : 4400Mi`
141
+
142
+

143
+
144
+
#### Verifying the Changes
145
+
146
+
After applying the changes, either wait for a scheduled backup to run or initiate an on-demand backup. If you still experience an OOMKilled failure, repeat the steps above and gradually increase memory limits and if it still persists increase `resources.limits.cpu` parameter also.
147
+
109
148
### Monitor a backup operation
110
149
111
150
The Azure Backup service creates a job for scheduled backups or if you trigger on-demand backup operation for tracking. To view the backup job status:
@@ -155,7 +194,7 @@ For AKS backup, backup and restore jobs can show the status **Completed with War
155
194
156
195
:::image type="content" source="./media/azure-kubernetes-service-cluster-manage-backups/backup-restore-jobs-completed-with-warnings.png" alt-text="Screenshot shows the backup and restore jobs completed with warnings." lightbox="./media/azure-kubernetes-service-cluster-manage-backups/backup-restore-jobs-completed-with-warnings.png":::
157
196
158
-
For example, if a backup job for an AKS cluster completes with the status **Completed with Warnings**, a restore point is created, but it does not have all the resources in the cluster backedup as per the backup configuration. The job shows warning details, providing the *issues* and *resources* that were impacted during the operation.
197
+
For example, if a backup job for an AKS cluster completes with the status **Completed with Warnings**, a restore point is created, but it does not have all the resources in the cluster backed-up as per the backup configuration. The job shows warning details, providing the *issues* and *resources* that were impacted during the operation.
159
198
160
199
To view these warnings, select **View Details** next to **Warning Details**.
161
200
@@ -186,7 +225,7 @@ You can change the associated policy with a backup instance.
186
225
187
226
There are three ways by which you can stop protecting an Azure Disk:
188
227
189
-
-**Stop Protection and Retain Data (Retain forever)**: This option helps you stop all future backup jobs from protecting your cluster. However, Azure Backup service retains the recovery points that are backedup forever. You need to pay to keep the recovery points in the vault (see [Azure Backup pricing](https://azure.microsoft.com/pricing/details/backup/) for details). You are able to restore the disk, if needed. To resume cluster protection, use the **Resume backup** option.
228
+
-**Stop Protection and Retain Data (Retain forever)**: This option helps you stop all future backup jobs from protecting your cluster. However, Azure Backup service retains the recovery points that are backed-up forever. You need to pay to keep the recovery points in the vault (see [Azure Backup pricing](https://azure.microsoft.com/pricing/details/backup/) for details). You are able to restore the disk, if needed. To resume cluster protection, use the **Resume backup** option.
190
229
191
230
-**Stop Protection and Retain Data (Retain as per Policy)**: This option helps you stop all future backup jobs from protecting your cluster. The recovery points are retained as per policy and will be chargeable according to [Azure Backup pricing](https://azure.microsoft.com/pricing/details/backup/). However, the latest recovery point is retained forever.
0 commit comments