Merge pull request #229991 from schaffererin/recovery-best-practices

Stacyrch140 · web-flow · commit fdc09b4f74aa · 2023-03-08T19:02:16.000-05:00
Freshness pass
diff --git a/articles/aks/operator-best-practices-multi-region.md b/articles/aks/operator-best-practices-multi-region.md
@@ -1,9 +1,8 @@
 ---
-title: Best practices for AKS business continuity and disaster recovery
-description: Learn a cluster operator's best practices to achieve maximum uptime for your applications, providing high availability and preparing for disaster recovery in Azure Kubernetes Service (AKS).
+title: Best practices for business continuity and disaster recovery in Azure Kubernetes Service (AKS)
+description: Best practices for a cluster operator to achieve maximum uptime for your applications and to provide high availability and prepare for disaster recovery in Azure Kubernetes Service (AKS).
 ms.topic: conceptual
-ms.date: 03/11/2021
-ms.author: thfalgou
+ms.date: 03/08/2023
 ms.custom: fasttrack-edit
 #Customer intent: As an AKS cluster operator, I want to plan for business continuity or disaster recovery to help protect my cluster from region problems.
 ---
@@ -15,8 +14,9 @@ As you manage clusters in Azure Kubernetes Service (AKS), application uptime bec
 This article focuses on how to plan for business continuity and disaster recovery in AKS. You learn how to:
 
 > [!div class="checklist"]
+
 > * Plan for AKS clusters in multiple regions.
-> * Route traffic across multiple clusters by using Azure Traffic Manager.
+> * Route traffic across multiple clusters using Azure Traffic Manager.
 > * Use geo-replication for your container image registries.
 > * Plan for application state across multiple clusters.
 > * Replicate storage across multiple regions.
@@ -30,15 +30,17 @@ This article focuses on how to plan for business continuity and disaster recover
 An AKS cluster is deployed into a single region. To protect your system from region failure, deploy your application into multiple AKS clusters across different regions. When planning where to deploy your AKS cluster, consider:
 
 * [**AKS region availability**](./quotas-skus-regions.md#region-availability)
-    * Choose regions close to your users. 
+    * Choose regions close to your users.
     * AKS continually expands into new regions.
+
 * [**Azure paired regions**](../availability-zones/cross-region-replication-azure.md)
     * For your geographic area, choose two regions paired together.
-    * AKS platform updates (planned maintenance) are serialized with a delay of at least 24 hours between paired regions. 
-    * Recovery efforts for paired regions are prioritized where needed. 
+    * AKS platform updates (planned maintenance) are serialized with a delay of at least 24 hours between paired regions.
+    * Recovery efforts for paired regions are prioritized where needed.
+
 * **Service availability**
     * Decide whether your paired regions should be hot/hot, hot/warm, or hot/cold.
-    * Do you want to run both regions at the same time, with one region *ready* to start serving traffic? Or,
+    * Do you want to run both regions at the same time, with one region *ready* to start serving traffic? *or*
     * Do you want to give one region time to get ready to serve traffic?
 
 AKS region availability and paired regions are a joint consideration. Deploy your AKS clusters into paired regions designed to manage region disaster recovery together. For example, AKS is available in East US and West US. These regions are paired. Choose these two regions when you're creating an AKS BC/DR strategy.
@@ -66,11 +68,12 @@ For information on how to set up endpoints and routing, see [Configure priority
 ### Application routing with Azure Front Door Service
 
 Using split TCP-based anycast protocol, [Azure Front Door Service](../frontdoor/front-door-overview.md) promptly connects your end users to the nearest Front Door POP (Point of Presence). More features of Azure Front Door Service:
+
 * TLS termination
 * Custom domain
 * Web application firewall
 * URL Rewrite
-* Session affinity 
+* Session affinity
 
 Review the needs of your application traffic to understand which solution is the most suitable.
 
@@ -83,18 +86,16 @@ Before peering virtual networks with running AKS clusters, use the standard Load
 ## Enable geo-replication for container images
 
 > **Best practice**
-> 
+>
 > Store your container images in Azure Container Registry and geo-replicate the registry to each AKS region.
 
-To deploy and run your applications in AKS, you need a way to store and pull the container images. Container Registry integrates with AKS, so it can securely store your container images or Helm charts. Container Registry supports multimaster geo-replication to automatically replicate your images to Azure regions around the world. 
+To deploy and run your applications in AKS, you need a way to store and pull the container images. Container Registry integrates with AKS, so it can securely store your container images or Helm charts. Container Registry supports multimaster geo-replication to automatically replicate your images to Azure regions around the world.
 
-To improve performance and availability:
-1. Use Container Registry geo-replication to create a registry in each region where you have an AKS cluster. 
-1. Each AKS cluster then pulls container images from the local container registry in the same region:
+To improve performance and availability, use Container Registry geo-replication to create a registry in each region where you have an AKS cluster.Each AKS cluster will then pull container images from the local container registry in the same region.
 
 ![Container Registry geo-replication for container images](media/operator-best-practices-bc-dr/acr-geo-replication.png)
 
-When you use Container Registry geo-replication to pull images from the same region, the results are:
+Using Container Registry geo-replication to pull images from the same region has the following benefits: 
 
 * **Faster**: Pull images from high-speed, low-latency network connections within the same Azure region.
 * **More reliable**: If a region is unavailable, your AKS cluster pulls the images from an available container registry.
@@ -105,14 +106,15 @@ Geo-replication is a *Premium* SKU container registry feature. For information o
 ## Remove service state from inside containers
 
 > **Best practice**
-> 
+>
 > Avoid storing service state inside the container. Instead, use an Azure platform as a service (PaaS) that supports multi-region replication.
 
 *Service state* refers to the in-memory or on-disk data required by a service to function. State includes the data structures and member variables that the service reads and writes. Depending on how the service is architected, the state might also include files or other resources stored on the disk. For example, the state might include the files a database uses to store data and transaction logs.
 
 State can be either externalized or co-located with the code that manipulates the state. Typically, you externalize state by using a database or other data store that runs on different machines over the network or that runs out of process on the same machine.
 
 Containers and microservices are most resilient when the processes that run inside them don't retain state. Since applications almost always contain some state, use a PaaS solution, such as:
+
 * Azure Cosmos DB
 * Azure Database for PostgreSQL
 * Azure Database for MySQL
@@ -139,6 +141,7 @@ Your applications might use Azure Storage for their data. If so, your applicatio
 Your applications might require persistent storage even after a pod is deleted. In Kubernetes, you can use persistent volumes to persist data storage. Persistent volumes are mounted to a node VM and then exposed to the pods. Persistent volumes follow pods even if the pods are moved to a different node inside the same cluster.
 
 The replication strategy you use depends on your storage solution. The following common storage solutions provide their own guidance about disaster recovery and replication:
+
 * [Gluster](https://docs.gluster.org/en/latest/Administrator-Guide/Geo-Replication/)
 * [Ceph](https://docs.ceph.com/docs/master/cephfs/disaster-recovery/)
 * [Rook](https://rook.io/docs/rook/v1.2/ceph-disaster-recovery.html)