You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/aks/operator-best-practices-multi-region.md
+20-17Lines changed: 20 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,8 @@
1
1
---
2
-
title: Best practices for AKS business continuity and disaster recovery
3
-
description: Learn a cluster operator's best practices to achieve maximum uptime for your applications, providing high availability and preparing for disaster recovery in Azure Kubernetes Service (AKS).
2
+
title: Best practices for business continuity and disaster recovery in Azure Kubernetes Service (AKS)
3
+
description: Best practices for a cluster operatorto achieve maximum uptime for your applications and to provide high availability and prepare for disaster recovery in Azure Kubernetes Service (AKS).
4
4
ms.topic: conceptual
5
-
ms.date: 03/11/2021
6
-
ms.author: thfalgou
5
+
ms.date: 03/08/2023
7
6
ms.custom: fasttrack-edit
8
7
#Customer intent: As an AKS cluster operator, I want to plan for business continuity or disaster recovery to help protect my cluster from region problems.
9
8
---
@@ -15,8 +14,9 @@ As you manage clusters in Azure Kubernetes Service (AKS), application uptime bec
15
14
This article focuses on how to plan for business continuity and disaster recovery in AKS. You learn how to:
16
15
17
16
> [!div class="checklist"]
17
+
18
18
> * Plan for AKS clusters in multiple regions.
19
-
> * Route traffic across multiple clusters by using Azure Traffic Manager.
19
+
> * Route traffic across multiple clusters using Azure Traffic Manager.
20
20
> * Use geo-replication for your container image registries.
21
21
> * Plan for application state across multiple clusters.
22
22
> * Replicate storage across multiple regions.
@@ -30,15 +30,17 @@ This article focuses on how to plan for business continuity and disaster recover
30
30
An AKS cluster is deployed into a single region. To protect your system from region failure, deploy your application into multiple AKS clusters across different regions. When planning where to deploy your AKS cluster, consider:
31
31
32
32
*[**AKS region availability**](./quotas-skus-regions.md#region-availability)
* For your geographic area, choose two regions paired together.
37
-
* AKS platform updates (planned maintenance) are serialized with a delay of at least 24 hours between paired regions.
38
-
* Recovery efforts for paired regions are prioritized where needed.
38
+
* AKS platform updates (planned maintenance) are serialized with a delay of at least 24 hours between paired regions.
39
+
* Recovery efforts for paired regions are prioritized where needed.
40
+
39
41
***Service availability**
40
42
* Decide whether your paired regions should be hot/hot, hot/warm, or hot/cold.
41
-
* Do you want to run both regions at the same time, with one region *ready* to start serving traffic? Or,
43
+
* Do you want to run both regions at the same time, with one region *ready* to start serving traffic? *or*
42
44
* Do you want to give one region time to get ready to serve traffic?
43
45
44
46
AKS region availability and paired regions are a joint consideration. Deploy your AKS clusters into paired regions designed to manage region disaster recovery together. For example, AKS is available in East US and West US. These regions are paired. Choose these two regions when you're creating an AKS BC/DR strategy.
@@ -66,11 +68,12 @@ For information on how to set up endpoints and routing, see [Configure priority
66
68
### Application routing with Azure Front Door Service
67
69
68
70
Using split TCP-based anycast protocol, [Azure Front Door Service](../frontdoor/front-door-overview.md) promptly connects your end users to the nearest Front Door POP (Point of Presence). More features of Azure Front Door Service:
71
+
69
72
* TLS termination
70
73
* Custom domain
71
74
* Web application firewall
72
75
* URL Rewrite
73
-
* Session affinity
76
+
* Session affinity
74
77
75
78
Review the needs of your application traffic to understand which solution is the most suitable.
76
79
@@ -83,18 +86,16 @@ Before peering virtual networks with running AKS clusters, use the standard Load
83
86
## Enable geo-replication for container images
84
87
85
88
> **Best practice**
86
-
>
89
+
>
87
90
> Store your container images in Azure Container Registry and geo-replicate the registry to each AKS region.
88
91
89
-
To deploy and run your applications in AKS, you need a way to store and pull the container images. Container Registry integrates with AKS, so it can securely store your container images or Helm charts. Container Registry supports multimaster geo-replication to automatically replicate your images to Azure regions around the world.
92
+
To deploy and run your applications in AKS, you need a way to store and pull the container images. Container Registry integrates with AKS, so it can securely store your container images or Helm charts. Container Registry supports multimaster geo-replication to automatically replicate your images to Azure regions around the world.
90
93
91
-
To improve performance and availability:
92
-
1. Use Container Registry geo-replication to create a registry in each region where you have an AKS cluster.
93
-
1. Each AKS cluster then pulls container images from the local container registry in the same region:
94
+
To improve performance and availability, use Container Registry geo-replication to create a registry in each region where you have an AKS cluster.Each AKS cluster will then pull container images from the local container registry in the same region.
94
95
95
96

96
97
97
-
When you use Container Registry geo-replication to pull images from the same region, the results are:
98
+
Using Container Registry geo-replication to pull images from the same region has the following benefits:
98
99
99
100
***Faster**: Pull images from high-speed, low-latency network connections within the same Azure region.
100
101
***More reliable**: If a region is unavailable, your AKS cluster pulls the images from an available container registry.
@@ -105,14 +106,15 @@ Geo-replication is a *Premium* SKU container registry feature. For information o
105
106
## Remove service state from inside containers
106
107
107
108
> **Best practice**
108
-
>
109
+
>
109
110
> Avoid storing service state inside the container. Instead, use an Azure platform as a service (PaaS) that supports multi-region replication.
110
111
111
112
*Service state* refers to the in-memory or on-disk data required by a service to function. State includes the data structures and member variables that the service reads and writes. Depending on how the service is architected, the state might also include files or other resources stored on the disk. For example, the state might include the files a database uses to store data and transaction logs.
112
113
113
114
State can be either externalized or co-located with the code that manipulates the state. Typically, you externalize state by using a database or other data store that runs on different machines over the network or that runs out of process on the same machine.
114
115
115
116
Containers and microservices are most resilient when the processes that run inside them don't retain state. Since applications almost always contain some state, use a PaaS solution, such as:
117
+
116
118
* Azure Cosmos DB
117
119
* Azure Database for PostgreSQL
118
120
* Azure Database for MySQL
@@ -139,6 +141,7 @@ Your applications might use Azure Storage for their data. If so, your applicatio
139
141
Your applications might require persistent storage even after a pod is deleted. In Kubernetes, you can use persistent volumes to persist data storage. Persistent volumes are mounted to a node VM and then exposed to the pods. Persistent volumes follow pods even if the pods are moved to a different node inside the same cluster.
140
142
141
143
The replication strategy you use depends on your storage solution. The following common storage solutions provide their own guidance about disaster recovery and replication:
0 commit comments