|
| 1 | +--- |
| 2 | +title: Reliability in Azure Kubernetes Service (AKS) |
| 3 | +description: Find out about reliability in Azure Kubernetes Service (AKS), including availability zones and multi-region deployments. |
| 4 | +author: schaffererin |
| 5 | +ms.author: schaffererin |
| 6 | +ms.topic: reliability-article |
| 7 | +ms.custom: subject-reliability, references_regions #Required - use references_regions if specific regions are mentioned. |
| 8 | +ms.service: azure-kubernetes-service |
| 9 | +ms.date: 01/30/2025 |
| 10 | +#Customer intent: As an engineer responsible for business continuity, I want to understand who need to understand the details of how AKS works from a reliability perspective and plan disaster recovery strategies in alignment with the exact processes that Azure services follow during different kinds of situations. |
| 11 | +--- |
| 12 | + |
| 13 | +# Reliability in Azure Kubernetes Service (AKS) |
| 14 | + |
| 15 | +This article describes reliability support in Azure Kubernetes Service (AKS), covering intra-regional resiliency via [availability zones](#availability-zone-support) and [multi-region deployments](#multi-region-support). |
| 16 | + |
| 17 | +Resiliency is a shared responsibility between you and Microsoft and so this article also covers ways for you to create a resilient solution that meets your needs. |
| 18 | + |
| 19 | +When you create an AKS cluster, the Azure platform automatically creates and configures a control plane. AKS offers three pricing tiers for cluster management: **Free**, **Standard**, and **Premium**. For more information, see [Free, Standard, and Premium pricing tiers for Azure Kubernetes Service (AKS) cluster management](/azure/aks/free-standard-pricing-tiers). |
| 20 | + |
| 21 | +## Production deployment recommendations |
| 22 | + |
| 23 | +- [Deployment and cluster reliability best practices for Azure Kubernetes Service (AKS)](/azure/aks/best-practices-app-cluster-reliability) |
| 24 | +- [High availability and disaster recovery overview for Azure Kubernetes Service (AKS)](/azure/aks/ha-dr-overview) |
| 25 | +- [Zone resiliency considerations for Azure Kubernetes Service (AKS)](/azure/aks/aks-zone-resiliency) |
| 26 | + |
| 27 | +## Redundancy |
| 28 | + |
| 29 | +## Transient faults |
| 30 | + |
| 31 | +## Availability zone support |
| 32 | + |
| 33 | +[!INCLUDE [AZ support description](includes/reliability-availability-zone-description-include.md)] |
| 34 | + |
| 35 | +You can configure AKS to be *zone redundant*, which means your resources are spread across multiple availability zones. Zone redundancy helps you achieve resiliency and reliability for your production workloads. |
| 36 | + |
| 37 | +### Region support |
| 38 | + |
| 39 | +You can deploy zone-redundant AKS resources into any [Azure region that supports availability zones](./availability-zones-region-support.md). |
| 40 | + |
| 41 | +### Requirements |
| 42 | + |
| 43 | +### Considerations |
| 44 | + |
| 45 | +When using availability zones in AKS, consider the following: |
| 46 | + |
| 47 | +- You can only define availability zones during creation of the cluster or node pool. |
| 48 | +- It's not possible to update an existing non-availability zone cluster to use availability zones after creating the cluster. |
| 49 | +- The chosen node size (VM SKU) selected must be available across all availability zones selected. |
| 50 | +- Clusters with availability zones enabled require using Azure Standard Load Balancers for distribution across zones. You can only define this load balancer type at cluster create time. For more information and the limitations of the standard load balancer, see [Azure load balancer standard SKU limitations](/azure/aks/load-balancer-standard#limitaitons). |
| 51 | +- When implementing **availability zones with the [cluster autoscaler](/azure/aks/cluster-autoscaler-overview)**, we recommend using a single node pool for each zone. You can set the `--balance-similar-node-groups` parameter to `true` to maintain a balanced distribution of nodes across zones for your workloads during scale up operations. When this approach isn't implemented, scale down operations can disrupt the balance of nodes across zones. This configuration doesn't guarantee that similar node groups will have the same number of nodes: |
| 52 | + - Currently, balancing happens during scale up operations only. The cluster autoscaler scales down underutilized nodes regardless of the relative sizes of the node groups. |
| 53 | + - The cluster autoscaler only adds as many nodes as required to run all existing pods. Some groups might have more nodes than others if they have more pods scheduled. |
| 54 | + - The cluster autoscaler only balances between node groups that can support the same set of pending pods. |
| 55 | +- You can use Azure zone-redundant storage (ZRS) disks to replicate your storage across three availability zones in the region you select. A ZRS disk lets you recover from availability zone failure without data loss. For more information, see [ZRS for managed disks](/azure/virtual-machines/disks-redundancy#zone-redundant-storage-for-managed-disks). |
| 56 | + |
| 57 | +### Cost |
| 58 | + |
| 59 | +### Configure availability zone support |
| 60 | + |
| 61 | +[Create an Azure Kubernetes Service (AKS) cluster that uses availability zones](/azure/aks/availability-zones) |
| 62 | + |
| 63 | +### Capacity planning and management |
| 64 | + |
| 65 | +### Traffic routing between zones |
| 66 | + |
| 67 | +### Data replication between zones |
| 68 | + |
| 69 | +### Zone-down experience |
| 70 | + |
| 71 | +### Failback |
| 72 | + |
| 73 | +### Testing for zone failures |
| 74 | + |
| 75 | +## Multi-region support |
| 76 | + |
| 77 | +To provide multi-region support for your AKS workloads, you can use Azure Kubernetes Fleet Manager. For more information, see the [Azure Kubernetes Fleet Manager documentation](/azure/kubernetes-fleet-overview). |
| 78 | + |
| 79 | +## Backups |
| 80 | + |
| 81 | +Azure Backup supports backing up AKS cluster resources and persistent volumes attached to the cluster using a backup extension. The Backup vault communicates with the AKS cluster through the extension to perform backup and restore operations. |
| 82 | + |
| 83 | +For more information, see the following articles: |
| 84 | + |
| 85 | +- [About AKS backup using Azure Backup (preview)](/azure/backup/azure-kubernetes-service-backup-overview) |
| 86 | +- [Back up AKS using Azure Backup (preview)](/azure/backup/azure-kubernetes-service-cluster-backup) |
| 87 | + |
| 88 | +For most solutions, you shouldn't rely exclusively on backups. Instead, use the other capabilities described in this guide to support your resiliency requirements. However, backups protect against some risks that other approaches don't. For more information, see [link to article about how backups contribute to a resiliency strategy]. |
| 89 | + |
| 90 | +## Service-level agreement |
| 91 | + |
| 92 | +The service-level agreement (SLA) for AKS describes the expected availability of the service, and the conditions that must be met to achieve that availability expectation. For more information, see [link to SLA for [service-name]]. |
| 93 | + |
| 94 | +## Related content |
| 95 | + |
| 96 | +<!-- 10.Related content --------------------------------------------------------------------- |
| 97 | +Required: Include any related content that points to a relevant task to accomplish, |
| 98 | +or to a related topic. |
| 99 | +
|
| 100 | +- [Reliability in Azure](/azure/availability-zones/overview.md) |
| 101 | +--> |
0 commit comments