Skip to content

Commit fdc09b4

Browse files
authored
Merge pull request #229991 from schaffererin/recovery-best-practices
Freshness pass
2 parents 497b8b3 + 3b24cf5 commit fdc09b4

File tree

1 file changed

+20
-17
lines changed

1 file changed

+20
-17
lines changed

articles/aks/operator-best-practices-multi-region.md

Lines changed: 20 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,8 @@
11
---
2-
title: Best practices for AKS business continuity and disaster recovery
3-
description: Learn a cluster operator's best practices to achieve maximum uptime for your applications, providing high availability and preparing for disaster recovery in Azure Kubernetes Service (AKS).
2+
title: Best practices for business continuity and disaster recovery in Azure Kubernetes Service (AKS)
3+
description: Best practices for a cluster operator to achieve maximum uptime for your applications and to provide high availability and prepare for disaster recovery in Azure Kubernetes Service (AKS).
44
ms.topic: conceptual
5-
ms.date: 03/11/2021
6-
ms.author: thfalgou
5+
ms.date: 03/08/2023
76
ms.custom: fasttrack-edit
87
#Customer intent: As an AKS cluster operator, I want to plan for business continuity or disaster recovery to help protect my cluster from region problems.
98
---
@@ -15,8 +14,9 @@ As you manage clusters in Azure Kubernetes Service (AKS), application uptime bec
1514
This article focuses on how to plan for business continuity and disaster recovery in AKS. You learn how to:
1615

1716
> [!div class="checklist"]
17+
1818
> * Plan for AKS clusters in multiple regions.
19-
> * Route traffic across multiple clusters by using Azure Traffic Manager.
19+
> * Route traffic across multiple clusters using Azure Traffic Manager.
2020
> * Use geo-replication for your container image registries.
2121
> * Plan for application state across multiple clusters.
2222
> * Replicate storage across multiple regions.
@@ -30,15 +30,17 @@ This article focuses on how to plan for business continuity and disaster recover
3030
An AKS cluster is deployed into a single region. To protect your system from region failure, deploy your application into multiple AKS clusters across different regions. When planning where to deploy your AKS cluster, consider:
3131

3232
* [**AKS region availability**](./quotas-skus-regions.md#region-availability)
33-
* Choose regions close to your users.
33+
* Choose regions close to your users.
3434
* AKS continually expands into new regions.
35+
3536
* [**Azure paired regions**](../availability-zones/cross-region-replication-azure.md)
3637
* For your geographic area, choose two regions paired together.
37-
* AKS platform updates (planned maintenance) are serialized with a delay of at least 24 hours between paired regions.
38-
* Recovery efforts for paired regions are prioritized where needed.
38+
* AKS platform updates (planned maintenance) are serialized with a delay of at least 24 hours between paired regions.
39+
* Recovery efforts for paired regions are prioritized where needed.
40+
3941
* **Service availability**
4042
* Decide whether your paired regions should be hot/hot, hot/warm, or hot/cold.
41-
* Do you want to run both regions at the same time, with one region *ready* to start serving traffic? Or,
43+
* Do you want to run both regions at the same time, with one region *ready* to start serving traffic? *or*
4244
* Do you want to give one region time to get ready to serve traffic?
4345

4446
AKS region availability and paired regions are a joint consideration. Deploy your AKS clusters into paired regions designed to manage region disaster recovery together. For example, AKS is available in East US and West US. These regions are paired. Choose these two regions when you're creating an AKS BC/DR strategy.
@@ -66,11 +68,12 @@ For information on how to set up endpoints and routing, see [Configure priority
6668
### Application routing with Azure Front Door Service
6769

6870
Using split TCP-based anycast protocol, [Azure Front Door Service](../frontdoor/front-door-overview.md) promptly connects your end users to the nearest Front Door POP (Point of Presence). More features of Azure Front Door Service:
71+
6972
* TLS termination
7073
* Custom domain
7174
* Web application firewall
7275
* URL Rewrite
73-
* Session affinity
76+
* Session affinity
7477

7578
Review the needs of your application traffic to understand which solution is the most suitable.
7679

@@ -83,18 +86,16 @@ Before peering virtual networks with running AKS clusters, use the standard Load
8386
## Enable geo-replication for container images
8487

8588
> **Best practice**
86-
>
89+
>
8790
> Store your container images in Azure Container Registry and geo-replicate the registry to each AKS region.
8891
89-
To deploy and run your applications in AKS, you need a way to store and pull the container images. Container Registry integrates with AKS, so it can securely store your container images or Helm charts. Container Registry supports multimaster geo-replication to automatically replicate your images to Azure regions around the world.
92+
To deploy and run your applications in AKS, you need a way to store and pull the container images. Container Registry integrates with AKS, so it can securely store your container images or Helm charts. Container Registry supports multimaster geo-replication to automatically replicate your images to Azure regions around the world.
9093

91-
To improve performance and availability:
92-
1. Use Container Registry geo-replication to create a registry in each region where you have an AKS cluster.
93-
1. Each AKS cluster then pulls container images from the local container registry in the same region:
94+
To improve performance and availability, use Container Registry geo-replication to create a registry in each region where you have an AKS cluster.Each AKS cluster will then pull container images from the local container registry in the same region.
9495

9596
![Container Registry geo-replication for container images](media/operator-best-practices-bc-dr/acr-geo-replication.png)
9697

97-
When you use Container Registry geo-replication to pull images from the same region, the results are:
98+
Using Container Registry geo-replication to pull images from the same region has the following benefits:
9899

99100
* **Faster**: Pull images from high-speed, low-latency network connections within the same Azure region.
100101
* **More reliable**: If a region is unavailable, your AKS cluster pulls the images from an available container registry.
@@ -105,14 +106,15 @@ Geo-replication is a *Premium* SKU container registry feature. For information o
105106
## Remove service state from inside containers
106107

107108
> **Best practice**
108-
>
109+
>
109110
> Avoid storing service state inside the container. Instead, use an Azure platform as a service (PaaS) that supports multi-region replication.
110111
111112
*Service state* refers to the in-memory or on-disk data required by a service to function. State includes the data structures and member variables that the service reads and writes. Depending on how the service is architected, the state might also include files or other resources stored on the disk. For example, the state might include the files a database uses to store data and transaction logs.
112113

113114
State can be either externalized or co-located with the code that manipulates the state. Typically, you externalize state by using a database or other data store that runs on different machines over the network or that runs out of process on the same machine.
114115

115116
Containers and microservices are most resilient when the processes that run inside them don't retain state. Since applications almost always contain some state, use a PaaS solution, such as:
117+
116118
* Azure Cosmos DB
117119
* Azure Database for PostgreSQL
118120
* Azure Database for MySQL
@@ -139,6 +141,7 @@ Your applications might use Azure Storage for their data. If so, your applicatio
139141
Your applications might require persistent storage even after a pod is deleted. In Kubernetes, you can use persistent volumes to persist data storage. Persistent volumes are mounted to a node VM and then exposed to the pods. Persistent volumes follow pods even if the pods are moved to a different node inside the same cluster.
140142

141143
The replication strategy you use depends on your storage solution. The following common storage solutions provide their own guidance about disaster recovery and replication:
144+
142145
* [Gluster](https://docs.gluster.org/en/latest/Administrator-Guide/Geo-Replication/)
143146
* [Ceph](https://docs.ceph.com/docs/master/cephfs/disaster-recovery/)
144147
* [Rook](https://rook.io/docs/rook/v1.2/ceph-disaster-recovery.html)

0 commit comments

Comments
 (0)