You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/reliability/reliability-aks.md
+13-21Lines changed: 13 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ ms.date: 03/18/2025
12
12
13
13
# Reliability in Azure Kubernetes Service (AKS)
14
14
15
-
This article describes reliability support in [Azure Kubernetes Service (AKS)](/azure/aks/what-is-aks), covering intra-regional resiliency via [availability zones](#availability-zone-support) and [multiple-region deployments](#multiple-region-support).
15
+
This article describes reliability support in [Azure Kubernetes Service (AKS)](/azure/aks/what-is-aks), covering intra-regional resiliency via [availability zones](#availability-zone-support) and [multiple-region deployments](#multi-region-support).
16
16
17
17
Resiliency is a shared responsibility between you and Microsoft. This article covers ways for you to create a resilient solution that meets your needs.
18
18
@@ -38,11 +38,13 @@ After this initial node pool setup is complete, you can [add or delete node pool
38
38
39
39
Resiliency is a shared responsibility between you and Microsoft. As a compute service, AKS manages some aspects of your cluster's reliability, but you're responsible for managing other aspects.
40
40
41
-
- Microsoft manages the control plane and other managed components of AKS.
41
+
-**Microsoft manages** the control plane and other managed components of AKS.
42
42
43
-
-You need to define how components that AKS deploys and manages on your behalf, including node pools and load balancers that attach to services, should be configured to meet your reliability requirements. Microsoft then deploys the components based on your requirements.
43
+
-**It's your responsibility to**:
44
44
45
-
- You need to manage any components outside of the AKS cluster, including storage and databases. Verify that these components meet your reliability requirements. When you deploy your workloads, ensure that other Azure components are also configured for resiliency by following the best practices for those services.
45
+
-*Define how components, including node pools and load balancers that attach to services, should be configured to meet your reliability requirements.* After you define the components, Microsoft then deploys and manages them on your behalf.
46
+
47
+
-*Manage any components outside of the AKS cluster, including storage and databases.* Verify that these components meet your reliability requirements. When you deploy your workloads, ensure that other Azure components are also configured for resiliency by following the best practices for those services.
46
48
47
49
## Transient faults
48
50
@@ -77,25 +79,15 @@ You can deploy zone-resilient AKS clusters into any region that supports availab
77
79
78
80
To enhance the reliability and resiliency of AKS production workloads in a region, you need to configure AKS for zone redundancy by making the following configurations:
79
81
80
-
- Deploy multiple replicas.
81
-
82
-
Kubernetes spreads your pods across nodes based on node labels. To spread your workload across zones, you need to deploy multiple replicas of your pod. For instance, if you configure the node pool with three zones but only deploy a single replica of your pod, your deployment isn't zone resilient.
83
-
84
-
- Enable automatic scaling.
85
-
86
-
Kubernetes node pools provide manual and automatic scaling options. By using manual scaling, you can add or delete nodes as needed, and pending pods wait until you scale up a node pool. AKS-managed scaling uses the [cluster autoscaler](/azure/aks/cluster-autoscaler) or [node autoprovisioning (NAP)](/azure/aks/node-autoprovision). AKS scales the node pool up or down based on the pod's requirements within your subscription's SKU quota and capacity. This method helps ensure that your pods are scheduled on available nodes in the availability zones.
87
-
88
-
- Set pod topology constraints.
89
-
90
-
Use pod topology spread constraints to control how pods are spread across different nodes or zones. Constraints help you achieve HA, resiliency, and efficient resource usage. If you prefer to spread pods strictly across zones, you can set constraints to force a pod into a pending state to maintain the balance of pods across zones. For more information, see [Pod topology spread constraints](/azure/aks/best-practices-app-cluster-reliability#pod-topology-spread-constraints).
82
+
-**Deploy multiple replicas.** Kubernetes spreads your pods across nodes based on node labels. To spread your workload across zones, you need to deploy multiple replicas of your pod. For instance, if you configure the node pool with three zones but only deploy a single replica of your pod, your deployment isn't zone resilient.
91
83
92
-
-Configure zone-resilient networking.
84
+
-**Enable automatic scaling.** Kubernetes node pools provide manual and automatic scaling options. By using manual scaling, you can add or delete nodes as needed, and pending pods wait until you scale up a node pool. AKS-managed scaling uses the [cluster autoscaler](/azure/aks/cluster-autoscaler) or [node autoprovisioning (NAP)](/azure/aks/node-autoprovision). AKS scales the node pool up or down based on the pod's requirements within your subscription's SKU quota and capacity. This method helps ensure that your pods are scheduled on available nodes in the availability zones.
93
85
94
-
If your pods serve external traffic, configure your cluster network architecture by using services like [Azure Application Gateway](../application-gateway/overview-v2.md), [Azure Load Balancer](../load-balancer/load-balancer-overview.md), or [Azure Front Door](../frontdoor/front-door-overview.md).
86
+
-**Set pod topology constraints.** Use pod topology spread constraints to control how pods are spread across different nodes or zones. Constraints help you achieve HA, resiliency, and efficient resource usage. If you prefer to spread pods strictly across zones, you can set constraints to force a pod into a pending state to maintain the balance of pods across zones. For more information, see [Pod topology spread constraints](/azure/aks/best-practices-app-cluster-reliability#pod-topology-spread-constraints).
95
87
96
-
-Ensure that dependencies are zone resilient.
88
+
-**Configure zone-resilient networking.** If your pods serve external traffic, configure your cluster network architecture by using services like [Azure Application Gateway](../application-gateway/overview-v2.md), [Azure Load Balancer](../load-balancer/load-balancer-overview.md), or [Azure Front Door](../frontdoor/front-door-overview.md).
97
89
98
-
Most AKS applications use other services for storage, security, or networking. Make sure that you review the zone resiliency recommendations for those services.
90
+
-**Ensure that dependencies are zone resilient.** Most AKS applications use other services for storage, security, or networking. Make sure that you review the zone resiliency recommendations for those services.
99
91
100
92
### Cost
101
93
@@ -154,11 +146,11 @@ You can test your resiliency to availability zone failures by using the followin
154
146
-[Cordon and drain nodes in a single availability zone](/azure/aks/aks-zone-resiliency#method-1-cordon-and-drain-nodes-in-a-single-az)
155
147
-[Simulate an availability zone failure by using Azure Chaos Studio](/azure/aks/aks-zone-resiliency#method-2-simulate-an-az-failure-using-azure-chaos-studio)
156
148
157
-
## Multiple-region support
149
+
## Multi-region support
158
150
159
151
AKS clusters are single-region resources. If the region is unavailable, your AKS cluster is also unavailable.
160
152
161
-
### Alternative multiple-region approaches
153
+
### Alternative multi-region approaches
162
154
163
155
If you need to deploy your Kubernetes workload to multiple Azure regions, you have two options to manage the orchestration of these clusters.
0 commit comments