You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/reliability/reliability-aks.md
+18-14Lines changed: 18 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,18 +1,20 @@
1
1
---
2
2
title: Reliability in Azure Kubernetes Service (AKS)
3
-
description: Learn about how to deploy reliable workloads in Azure Kubernetes Service (AKS), including availability zones and multiple-region deployments.
3
+
description: Find out about how to deploy reliable workloads in Azure Kubernetes Service (AKS), including availability zones and multi-region deployments.
4
4
author: schaffererin
5
5
ms.author: schaffererin
6
6
ms.topic: reliability-article
7
-
ms.custom: subject-reliability, references_regions #Required - use references_regions if specific regions are mentioned.
7
+
ms.custom: subject-reliability
8
8
ms.service: azure-kubernetes-service
9
9
ms.date: 03/18/2025
10
-
#Customer intent: As an engineer who manages business continuity, I want to understand how AKS works from a reliability perspective and plan disaster recovery strategies that align with the processes that Azure services follow in different scenarios.
10
+
#Customer intent: As an engineer responsible for business continuity, I want to understand the details of how AKS works from a reliability perspective and plan disaster recovery strategies in alignment with the exact processes that Azure services follow during different kinds of situations.
11
11
---
12
12
13
13
# Reliability in Azure Kubernetes Service (AKS)
14
14
15
-
This article describes reliability support in [Azure Kubernetes Service (AKS)](/azure/aks/what-is-aks). It addresses zone resiliency, [availability zones](./availability-zones-overview.md), and multiple-region deployments.
15
+
This article describes reliability support in [Azure Kubernetes Service (AKS)](/azure/aks/what-is-aks), covering intra-regional resiliency via [availability zones](#availability-zone-support) and [multi-region deployments](#multi-region-support).
16
+
17
+
Resiliency is a shared responsibility between you and Microsoft and so this article also covers ways for you to create a resilient solution that meets your needs.
16
18
17
19
## Production deployment recommendations
18
20
@@ -107,15 +109,17 @@ There's no extra charge to enable availability zone support in AKS. You pay for
107
109
108
110
- You can't disable availability zone support after you create a cluster. Instead, you need to create a new cluster with availability zone support disabled and delete the old one.
109
111
110
-
### Normal operations
112
+
### Traffic routing between zones
113
+
114
+
When you deploy an AKS cluster that uses availability zones, it's important to ensure that your networking components are also zone resilient. Depending on the load balancers and other networking components that you use, you might need to explicitly configure components to route traffic to the correct nodes in the correct zones and to respond to zone outages. For more information, see [Zone resiliency considerations for AKS](/azure/aks/aks-zone-resiliency).
111
115
112
-
-**Traffic routing between zones:** When you deploy an AKS cluster that uses availability zones, it's important to ensure that your networking components are also zone resilient. Depending on the load balancers and other networking components that you use, you might need to explicitly configure components to route traffic to the correct nodes in the correct zones and to respond to zone outages. For more information, see [Zone resiliency considerations for AKS](/azure/aks/aks-zone-resiliency).
116
+
### Data replication between zones
113
117
114
-
-**Data replication between zones:**If you run a stateless workload, you should use managed Azure services, such [Azure databases](https://azure.microsoft.com/products/category/databases/), [Azure Cache for Redis](/azure/azure-cache-for-redis/cache-overview), or [Azure Storage](https://azure.microsoft.com/products/category/storage/) to store the application data. By using these services, you can ensure that your traffic can be moved across nodes and zones without risking data loss or affecting the user experience. You can use Kubernetes [Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/), [Services](https://kubernetes.io/docs/concepts/services-networking/service/), and [Health Probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) to manage stateless pods and ensure even distribution across zones.
118
+
If you run a stateless workload, you should use managed Azure services, such [Azure databases](https://azure.microsoft.com/products/category/databases/), [Azure Cache for Redis](/azure/azure-cache-for-redis/cache-overview), or [Azure Storage](https://azure.microsoft.com/products/category/storage/) to store the application data. By using these services, you can ensure that your traffic can be moved across nodes and zones without risking data loss or affecting the user experience. You can use Kubernetes [Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/), [Services](https://kubernetes.io/docs/concepts/services-networking/service/), and [Health Probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) to manage stateless pods and ensure even distribution across zones.
115
119
116
120
If you need to store state within your cluster by using Azure disks, use Azure zone-redundant storage to ensure that your data is replicated across multiple availability zones. For more information, see [Choose the right disk type based on application needs](/azure/aks/aks-zone-resiliency#make-your-storage-disk-decision).
117
121
118
-
### Zone-down scenarios
122
+
### Zone-down experience
119
123
120
124
-**Detection and response:** When a zone outage occurs, the control plane automatically fails over. If your node pools use availability zones and follow [zone resiliency best practices](#considerations), you can expect AKS to bring up nodes and replicas in the zones that are up and running. AKS does this task automatically when you use managed solutions like cluster autoscaler or NAP. Without autoscaling, nodes and replicas remain in the *Pending* state and wait for manual intervention to scale up the node pool.
121
125
@@ -152,11 +156,11 @@ You can test your resiliency to availability zone failures by using the followin
152
156
-[Cordon and drain nodes in a single availability zone](/azure/aks/aks-zone-resiliency#method-1-cordon-and-drain-nodes-in-a-single-az)
153
157
-[Simulate an availability zone failure by using Azure Chaos Studio](/azure/aks/aks-zone-resiliency#method-2-simulate-an-az-failure-using-azure-chaos-studio)
154
158
155
-
## Multiple-region support
159
+
## Multi-region support
156
160
157
161
AKS clusters are single-region resources. If the region is unavailable, your AKS cluster is also unavailable.
158
162
159
-
### Alternative multiple-region approaches
163
+
### Alternative multi-region approaches
160
164
161
165
If you need to deploy your Kubernetes workload to multiple Azure regions, you have two options to manage the orchestration of these clusters.
162
166
@@ -181,13 +185,13 @@ For more information, see the following articles:
181
185
-[What is Azure Kubernetes Service backup?](/azure/backup/azure-kubernetes-service-backup-overview)
182
186
-[Back up AKS by using Azure Backup](/azure/backup/azure-kubernetes-service-cluster-backup)
183
187
184
-
For most solutions, you shouldn't rely exclusively on backups. Instead, use the other capabilities described in this guide to support your resiliency requirements. Strive to use stateless clusters that minimize the need for backup. Store data in external storage systems and databases instead of within your cluster.
188
+
For most solutions, you shouldn't rely exclusively on backups. Instead, use the other capabilities described in this guide to support your resiliency requirements. However, backups protect against some risks that other approaches don't. For more information, see [Redundancy, replication, and backup](concept-redundancy-replication-backup.md).
185
189
186
-
If you maintain state in your cluster, backups protect against some risks that other approaches don't. For more information, see [Redundancy, replication, and backup](concept-redundancy-replication-backup.md).
190
+
Strive to use stateless clusters that minimize the need for backup. Store data in external storage systems and databases instead of within your cluster.
187
191
188
-
## Service-level agreement and pricing tiers
192
+
## Service-level agreement
189
193
190
-
The service-level agreement (SLA) for Azure Logic Apps describes the expected availability of the service. This agreement also describes the conditions to meet to achieve this expectation. To understand these conditions, review[SLAs for online services](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services).
194
+
The service-level agreement (SLA) for Azure Logic Apps describes the expected availability of the service and the conditions that must be met to achieve that availability expectation. For more information, see[SLAs for online services](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services).
191
195
192
196
AKS offers three pricing tiers for cluster management: **Free**, **Standard**, and **Premium**. For more information, see [Free, Standard, and Premium pricing tiers for AKS cluster management](/azure/aks/free-standard-pricing-tiers). The Free tier enables you to use AKS to test your workloads. The Standard and Premium tiers are designed for production workloads. When you deploy an AKS cluster that has availability zones enabled, the uptime percentage defined in the SLA increases. However, the SLA applies only if you deploy a cluster in the Standard or Premium pricing tier.
0 commit comments