You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Upgrade an Azure Kubernetes Service (AKS) cluster
10
10
11
-
Part of the AKS cluster lifecycle involves performing periodic upgrades to the latest Kubernetes version. It’s important you apply the latest security releases, or upgrade to get the latest features. This article shows you how to check for, configure, and apply upgrades to your AKS cluster.
11
+
Part of the AKS cluster lifecycle involves performing periodic upgrades to the latest Kubernetes version. It's important you apply the latest security releases, or upgrade to get the latest features. This article shows you how to check for, configure, and apply upgrades to your AKS cluster.
12
12
13
-
For AKS clusters that use multiple node pools or Windows Server nodes, see [Upgrade a node pool in AKS][nodepool-upgrade]. To upgrade a specific node pool without doing a Kubernetes cluster upgrade, see [Upgrade a specific node pool][specific-nodepool].
13
+
For AKS clusters that use multiple node pools or Windows Server nodes, see [Upgrade a node pool in AKS][nodepool-upgrade]. To upgrade a specific node pool without performing a Kubernetes cluster upgrade, see [Upgrade a specific node pool][specific-nodepool].
14
14
15
15
## Kubernetes version upgrades
16
16
@@ -19,7 +19,10 @@ When you upgrade a supported AKS cluster, Kubernetes minor versions can't be ski
19
19
Skipping multiple versions can only be done when upgrading from an *unsupported version* back to a *supported version*. For example, an upgrade from an unsupported *1.10.x* -> a supported *1.15.x* can be completed if available. When performing an upgrade from an *unsupported version* that skips two or more minor versions, the upgrade is performed without any guarantee of functionality and is excluded from the service-level agreements and limited warranty. If your version is significantly out of date, we recommend you recreate your cluster.
20
20
21
21
> [!NOTE]
22
-
> Any upgrade operation, whether performed manually or automatically, will upgrade the node image version if not already on the latest. The latest version is contingent on a full AKS release and can be determined by visiting the [AKS release tracker][release-tracker].
22
+
> Any upgrade operation, whether performed manually or automatically, upgrades the node image version if not already using the latest version. The latest version is contingent on a full AKS release and can be determined by visiting the [AKS release tracker][release-tracker].
23
+
24
+
> [!IMPORTANT]
25
+
> An upgrade operation might fail if you made customizations to AKS agent nodes. For more information see our [Support policy][support-policy-user-customizations-agent-nodes].
23
26
24
27
## Before you begin
25
28
@@ -188,7 +191,7 @@ During the cluster upgrade process, AKS performs the following operations:
188
191
5. In **Kubernetes version**, select your desired version and then select **Save**.
189
192
6. Navigate to your AKS cluster **Overview** page, and select the **Kubernetes version** to confirm the upgrade was successful.
190
193
191
-
The Azure portal highlights all the deprecated APIs between your current version and newer, available versions you intend to migrate to. For more information, see [the Kubernetes API removal and deprecation process][k8s-deprecation].
194
+
The Azure portal highlights all the deprecated APIs between your current and newer version, and available versions you intend to migrate to. For more information, see [the Kubernetes API removal and deprecation process][k8s-deprecation].
192
195
193
196
:::image type="content" source="./media/upgrade-cluster/portal-upgrade.png" alt-text="The screenshot of the upgrade blade for an AKS cluster in the Azure portal. The automatic upgrade field shows 'patch' selected, and several APIs deprecated between the selected Kubernetes version and the cluster's current version are described.":::
194
197
@@ -234,6 +237,7 @@ All of the following criteria must be met in order for the stop to occur:
234
237
* If performed via REST, the upgrade operation uses a preview API version of `2023-01-02-preview` or later.
235
238
* If performed via Azure CLI, the `aks-preview` CLI extension 0.5.134 or later must be installed.
236
239
* The last seen usage of deprecated APIs for the targeted version you're upgrading to must occur within 12 hours before the upgrade operation. AKS records usage hourly, so any usage of deprecated APIs within one hour isn't guaranteed to appear in the detection.
240
+
* Even API usage that is actually watching for deprecated resources is covered here. Look at the [Verb][k8s-api] for the distinction.
237
241
238
242
### Mitigating stopped upgrade operations
239
243
@@ -253,15 +257,15 @@ After receiving the error message, you have two options to mitigate the issue. Y
253
257
254
258
1. In the Azure portal, navigate to your cluster's overview page, and select **Diagnose and solve problems**.
255
259
256
-
2. Navigate to the **Known Issues, Availability and Performance** category, and select **Selected Kubernetes API deprecations**.
260
+
2. Navigate to the **Create, Upgrade, Delete and Scale** category, and select **Kubernetes API deprecations**.
257
261
258
-
:::image type="content" source="./media/upgrade-cluster/applens-api-detection-inline.png" lightbox="./media/upgrade-cluster/applens-api-detection-full.png" alt-text="A screenshot of the Azure portal showing the 'Selected Kubernetes API deprecations' section.":::
262
+
:::image type="content" source="./media/upgrade-cluster/applens-api-detection-full-v2.png" alt-text="A screenshot of the Azure portal showing the 'Selected Kubernetes API deprecations' section.":::
259
263
260
-
3. Wait 12 hours from the time the last deprecated API usage was seen.
264
+
3. Wait 12 hours from the time the last deprecated API usage was seen. Check the verb in the deprecated api usage to know if it is a [watch][k8s-api].
261
265
262
266
4. Retry your cluster upgrade.
263
267
264
-
You can alsocheck past API usage by enabling [Container Insights][container-insights] and exploring kube audit logs.
268
+
You can alsocheck past API usage by enabling [Container Insights][container-insights] and exploring kube audit logs. Check the verb in the deprecated api usage to understand, if it is a [watch][k8s-api] use case.
265
269
266
270
### Bypass validation to ignore API changes
267
271
@@ -315,7 +319,15 @@ You can set an auto-upgrade channel on your cluster. For more information, see [
315
319
316
320
AKS uses best-effort zone balancing in node groups. During an upgrade surge, the zones for the surge nodes in Virtual Machine Scale Sets are unknown ahead of time, which can temporarily cause an unbalanced zone configuration during an upgrade. However, AKS deletes surge nodes once the upgrade completes and preserves the original zone balance. If you want to keep your zones balanced during upgrades, you can increase the surge to a multiple of *three nodes*, and Virtual Machine Scale Sets balances your nodes across availability zones with best-effort zone balancing.
317
321
318
-
If you have PVCs backed by Azure LRS Disks, they’ll be bound to a particular zone. They may fail to recover immediately if the surge node doesn’t match the zone of the PVC. This could cause downtime on your application when the upgrade operation continues to drain nodes but the PVs are bound to a zone. To handle this case and maintain high availability, configure a [Pod Disruption Budget](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) on your application to allow Kubernetes to respect your availability requirements during the drain operation.
322
+
If you have PVCs backed by Azure LRS Disks, they'll be bound to a particular zone. They may fail to recover immediately if the surge node doesn't match the zone of the PVC. This could cause downtime on your application when the upgrade operation continues to drain nodes but the PVs are bound to a zone. To handle this case and maintain high availability, configure a [Pod Disruption Budget](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) on your application to allow Kubernetes to respect your availability requirements during the drain operation.
323
+
324
+
## Optimize upgrades to improve performance and minimize disruptions
325
+
326
+
The combination of [Planned Maintenance Window][planned-maintenance], [Max Surge](#customize-node-surge-upgrade), and [Pod Disruption Budget][pdb-spec] can significantly increase the likelihood of node upgrades completing successfully by the end of the maintenance window while also minimizing disruptions.
327
+
328
+
*[Planned Maintenance Window][planned-maintenance] enables service teams to schedule auto-upgrade during a pre-defined window, typically a low-traffic period, to minimize workload impact. A window duration of at least 4 hours is recommended.
329
+
* Max Surge on the node pool allows requesting additional quota during the upgrade process and limits the number of nodes selected for upgrade simultaneously. A higher max surge results in a faster upgrade process. However, setting it at 100% is not recommended as it would upgrade all nodes simultaneously, potentially causing disruptions to running applications. A max surge quota of 33% for production node pools is recommended.
330
+
*[Pod Disruption Budget][pdb-spec] is set for service applications and limits the number of pods that can be down during voluntary disruptions, such as AKS-controlled node upgrades. It can be configured as `minAvailable` replicas, indicating the minimum number of application pods that need to be active, or `maxUnavailable` replicas, indicating the maximum number of application pods that can be terminated, ensuring high availability for the application. Refer to the guidance provided for configuring [Pod Disruption Budgets (PDBs)][pdb-concepts]. PDB values should be validated to determine the settings that work best for your specific service.
319
331
320
332
## Next steps
321
333
@@ -326,6 +338,8 @@ This article showed you how to upgrade an existing AKS cluster. To learn more ab
0 commit comments