Skip to content

Commit 42f36fb

Browse files
committed
Updated doc per feedback
1 parent b73b623 commit 42f36fb

File tree

1 file changed

+15
-28
lines changed

1 file changed

+15
-28
lines changed

articles/aks/best-practices-app-cluster-reliability.md

Lines changed: 15 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Deployment and cluster reliability best practices for Azure Kubernetes Se
33
titleSuffix: Azure Kubernetes Service
44
description: Learn the best practices for deployment and cluster reliability for Azure Kubernetes Service (AKS) workloads.
55
ms.topic: conceptual
6-
ms.date: 02/21/2024
6+
ms.date: 02/22/2024
77
---
88

99
# Deployment and cluster reliability best practices for Azure Kubernetes Service (AKS)
@@ -15,7 +15,7 @@ The best practices in this article are organized into the following categories:
1515
| Category | Best practices |
1616
| -------- | -------------- |
1717
| [Deployment level best practices](#deployment-level-best-practices) |[Pod Disruption Budgets (PDBs)](#pod-disruption-budgets-pdbs) <br/> • [Pod CPU and memory limits](#pod-cpu-and-memory-limits) <br/> • [Pre-stop hooks](#pre-stop-hooks) <br/> • [maxUnavailable](#maxunavailable) <br/> • [Pod anti-affinity](#pod-anti-affinity) <br/> • [Readiness and liveness probes](#readiness-and-liveness-probes) <br/> • [Multi-replica applications](#multi-replica-applications) |
18-
| [Cluster and node pool level best practices](#cluster-and-node-pool-level-best-practices) |[Availability zones](#availability-zones) <br/> • [Cluster autoscaling](#cluster-autoscaling) <br/> • [Scale-down mode](#scale-down-mode) <br/> • [Standard Load Balancer](#standard-load-balancer) <br/> • [System node pools](#system-node-pools) <br/> • [Accelerated Networking](#accelerated-networking) <br/> • [Image versions](#image-versions) <br/> • [Azure CNI for dynamic IP allocation](#azure-cni-for-dynamic-ip-allocation) <br/> • [v5 SKU VMs](#v5-sku-vms) <br/> • [Do *not* use B series VMs](#do-not-use-b-series-vms) <br/> • [Premium Disks](#premium-disks) <br/> • [Container Insights](#container-insights) <br/> • [Azure Policy](#azure-policy) |
18+
| [Cluster and node pool level best practices](#cluster-and-node-pool-level-best-practices) |[Availability zones](#availability-zones) <br/> • [Cluster autoscaling](#cluster-autoscaling) <br/> • [Standard Load Balancer](#standard-load-balancer) <br/> • [System node pools](#system-node-pools) <br/> • [Accelerated Networking](#accelerated-networking) <br/> • [Image versions](#image-versions) <br/> • [Azure CNI for dynamic IP allocation](#azure-cni-for-dynamic-ip-allocation) <br/> • [v5 SKU VMs](#v5-sku-vms) <br/> • [Do *not* use B series VMs](#do-not-use-b-series-vms) <br/> • [Premium Disks](#premium-disks) <br/> • [Container Insights](#container-insights) <br/> • [Azure Policy](#azure-policy) |
1919

2020
## Deployment level best practices
2121

@@ -109,11 +109,11 @@ For more information, see [Assign CPU Resources to Containers and Pods](https://
109109

110110
> **Best practice guidance**
111111
>
112-
> Use pre-stop hooks to ensure graceful termination during SIGTERM.
112+
> Use pre-stop hooks to ensure graceful termination of a container.
113113

114-
A `PreStop` hook is called immediately before a container is terminated due to an API request or management event, such as a liveness probe failure. The pod's termination grace period countdown begins before the `PreStop` hook is executed, so the container eventually terminates within the termination grace period.
114+
A `PreStop` hook is called immediately before a container is terminated due to an API request or management event, such as preemption, resource contention, or a liveness/startup probe failure. A call to the `PreStop` hook fails if the container is already in a terminated or completed state, and the hook must complete before the TERM signal to stop the container is sent. The pod's termination grace period countdown begins before the `PreStop` hook is executed, so the container eventually terminates within the termination grace period.
115115

116-
The following example pod definition file shows how to use a `PreStop` hook to ensure graceful termination during SIGTERM:
116+
The following example pod definition file shows how to use a `PreStop` hook to ensure graceful termination of a container:
117117

118118
```yaml
119119
apiVersion: v1
@@ -219,7 +219,11 @@ For more information, see [Affinity and anti-affinity in Kubernetes](https://kub
219219
> [!TIP]
220220
> Use pod anti-affinity across availability zones to ensure that pods are spread across availability zones for zone-down scenarios.
221221
>
222-
> When you deploy your application across multiple availability zones, you can use pod anti-affinity to ensure that pods are spread across availability zones. This practice helps ensure that your application remains available in the event of a zone-down scenario. For more information, see [Best practices for multiple zones](https://kubernetes.io/docs/setup/best-practices/multiple-zones/) and [Overview of availability zones for AKS clusters](./availability-zones.md#overview-of-availability-zones-for-aks-clusters).
222+
> You can think of availability zones as backups for your application. If one zone goes down, your application can continue to run in another zone. You use affinity and anti-affinity rules to schedule specific pods on specific nodes. For example, let's say you have a memory/CPU-intensive pod, you might want to schedule it on a larger VM SKU to give the pod the capacity it needs to run.
223+
>
224+
> When you deploy your application across multiple availability zones, you can use pod anti-affinity to ensure that pods are spread across availability zones. This practice helps ensure that your application remains available in the event of a zone-down scenario.
225+
>
226+
> For more information, see [Best practices for multiple zones](https://kubernetes.io/docs/setup/best-practices/multiple-zones/) and [Overview of availability zones for AKS clusters](./availability-zones.md#overview-of-availability-zones-for-aks-clusters).
223227

224228
### Readiness and liveness probes
225229

@@ -332,26 +336,6 @@ You can also enable the cluster autoscaler on an existing node pool and configur
332336

333337
For more information, see [Use the cluster autoscaler in AKS](./cluster-autoscaler.md).
334338

335-
### Scale-down mode
336-
337-
> **Best practice guidance**
338-
>
339-
> Use scale-down mode to control the delete and deallocate behavior of nodes in your AKS cluster upon scaling down.
340-
341-
By default, scale up operations performed manually or by the cluster autoscaler require the allocation and provisioning of new nodes, and scale down operations delete nodes. Scale-down mode allows you to decide whether you want to delete or deallocate the nodes in your AKS clusters upon scaling down.
342-
343-
You can use the `--scale-down-mode` parameter to set the scale-down mode to `Deallocate` or `Delete`, as shown in the following examples:
344-
345-
```azurecli-interactive
346-
# Set the scale-down mode to Deallocate
347-
az aks nodepool add --node-count 20 --scale-down-mode Deallocate --node-osdisk-type Managed --max-pods 10 --name nodepool2 --cluster-name myAKSCluster --resource-group myResourceGroup
348-
349-
# Set the scale-down mode to Delete
350-
az aks nodepool add --enable-cluster-autoscaler --min-count 1 --max-count 10 --max-pods 10 --node-osdisk-type Managed --scale-down-mode Delete --name nodepool3 --cluster-name myAKSCluster --resource-group myResourceGroup
351-
```
352-
353-
For more information, see [Use scale-down mode to delete or deallocate nodes in AKS](./scale-down-mode.md).
354-
355339
### Standard Load Balancer
356340

357341
> **Best practice guidance**
@@ -360,6 +344,9 @@ For more information, see [Use scale-down mode to delete or deallocate nodes in
360344

361345
In Azure, the [Standard Load Balancer](../load-balancer/skus.md) SKU is designed to be equipped for load balancing network layer traffic when high performance and low latency are needed. The Standard Load Balancer routes traffic within and across regions and to availability zones for high resiliency. The Standard SKU is the recommended and default SKU to use when creating an AKS cluster.
362346

347+
> [!IMPORTANT]
348+
> On September 30, 2025, Basic Load Balancer will be retired. For more information, see the [official announcement](https://azure.microsoft.com/updates/azure-basic-load-balancer-will-be-retired-on-30-september-2025-upgrade-to-standard-load-balancer/). We recommend that you use the Standard Load Balancer for new deployments and upgrade existing deployments to the Standard Load Balancer. For more information, see [Upgrading from Basic Load Balancer](../load-balancer/load-balancer-basic-upgrade-guidance.md).
349+
363350
The following example shows a `LoadBalancer` service manifest that uses the Standard Load Balancer:
364351

365352
```yaml
@@ -437,7 +424,7 @@ AKS provides multiple auto-upgrade channels for node OS image upgrades. You can
437424
>
438425
> Use the standard tier for product workloads for greater cluster reliability and resources, support for up to 5,000 nodes in a cluster, and Uptime SLA enabled by default.
439426

440-
The standard tier for Azure Kubernetes Service (AKS) provides a financially backed 99.9% uptime service-level agreement (SLA) for your production workloads. The standard tier also provides greater cluster reliability and resources, support for up to 5,000 nodes in a cluster, and Uptime SLA enabled by default. For more information, see [Standard pricing tier for AKS cluster management](./free-standard-pricing-tiers.md).
427+
The standard tier for Azure Kubernetes Service (AKS) provides a financially backed 99.9% uptime [service-level agreement (SLA)](https://www.azure.cn/en-us/support/sla/kubernetes-service/) for your production workloads. The standard tier also provides greater cluster reliability and resources, support for up to 5,000 nodes in a cluster, and Uptime SLA enabled by default. For more information, see [Standard pricing tier for AKS cluster management](./free-standard-pricing-tiers.md).
441428

442429
### Azure CNI for dynamic IP allocation
443430

@@ -461,7 +448,7 @@ For more information, see [Configure Azure CNI networking for dynamic allocation
461448
>
462449
> Use v5 VM SKUs for improved performance during and after updates, less overall impact, and a more reliable connection for your applications.
463450

464-
For system node pools in AKS, use v5 SKU VMs or an equivalent core/memory VM SKU with ephemeral OS disks to provide sufficient compute resources for kube-system pods. For more information, see [Best practices for creating and running AKS clusters at scale](./operator-best-practices-run-at-scale.md) and [Best practices for performance and scaling for large workloads in AKS](./best-practices-performance-scale-large.md).
451+
For node pools in AKS, use v5 SKU VMs with ephemeral OS disks to provide sufficient compute resources for kube-system pods. For more information, see [Best practices for creating and running AKS clusters at scale](./operator-best-practices-run-at-scale.md) and [Best practices for performance and scaling for large workloads in AKS](./best-practices-performance-scale-large.md).
465452

466453
### Do *not* use B series VMs
467454

0 commit comments

Comments
 (0)