Skip to content

Commit a6a79c5

Browse files
authored
Update designing-cloud-solutions-for-disaster-recovery.md
1 parent f33acf3 commit a6a79c5

File tree

1 file changed

+11
-11
lines changed

1 file changed

+11
-11
lines changed

articles/azure-sql/database/designing-cloud-solutions-for-disaster-recovery.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -30,23 +30,23 @@ In this scenario, the applications have the following characteristics:
3030
* Web tier and data tier must be collocated to reduce latency and traffic cost
3131
* Fundamentally, downtime is a higher business risk for these applications than data loss
3232

33-
In this case, the application deployment topology is optimized for handling regional disasters when all application components need to failover together. The diagram below shows this topology. For geographic redundancy, the application’s resources are deployed to Region A and B. However, the resources in Region B are not utilized until Region A fails. A failover group is configured between the two regions to manage database connectivity, replication and failover. The web service in both regions is configured to access the database via the read-write listener **<failover-group-name>.database.windows.net** (1). Traffic manager is set up to use [priority routing method](../../traffic-manager/traffic-manager-configure-priority-routing-method.md) (2).  
33+
In this case, the application deployment topology is optimized for handling regional disasters when all application components need to fail over together. The diagram below shows this topology. For geographic redundancy, the application’s resources are deployed to Region A and B. However, the resources in Region B are not utilized until Region A fails. A failover group is configured between the two regions to manage database connectivity, replication and failover. The web service in both regions is configured to access the database via the read-write listener **<failover-group-name>.database.windows.net** (1). Azure Traffic Manager is set up to use [priority routing method](../../traffic-manager/traffic-manager-configure-priority-routing-method.md) (2).  
3434

3535
> [!NOTE]
36-
> [Azure traffic manager](../../traffic-manager/traffic-manager-overview.md) is used throughout this article for illustration purposes only. You can use any load-balancing solution that supports priority routing method.
36+
> [Azure Traffic Manager](../../traffic-manager/traffic-manager-overview.md) is used throughout this article for illustration purposes only. You can use any load-balancing solution that supports priority routing method.
3737
3838
The following diagram shows this configuration before an outage:
3939

4040
![Scenario 1. Configuration before the outage.](./media/designing-cloud-solutions-for-disaster-recovery/scenario1-a.png)
4141

42-
After an outage in the primary region, SQL Database detects that the primary database is not accessible and triggers failover to the secondary region based on the parameters of the automatic failover policy (1). Depending on your application SLA, you can configure a grace period that controls the time between the detection of the outage and the failover itself. It is possible that traffic manager initiates the endpoint failover before the failover group triggers the failover of the database. In that case the web application cannot immediately reconnect to the database. But the reconnections will automatically succeed as soon as the database failover completes. When the failed region is restored and back online, the old primary automatically reconnects as a new secondary. The diagram below illustrates the configuration after failover.
42+
After an outage in the primary region, SQL Database detects that the primary database is not accessible and triggers failover to the secondary region based on the parameters of the automatic failover policy (1). Depending on your application SLA, you can configure a grace period that controls the time between the detection of the outage and the failover itself. It is possible that Traffic Manager initiates the endpoint failover before the failover group triggers the failover of the database. In that case the web application cannot immediately reconnect to the database. But the reconnections will automatically succeed as soon as the database failover completes. When the failed region is restored and back online, the old primary automatically reconnects as a new secondary. The diagram below illustrates the configuration after failover.
4343

4444
> [!NOTE]
4545
> All transactions committed after the failover are lost during the reconnection. After the failover is completed, the application in region B is able to reconnect and restart processing the user requests. Both the web application and the primary database are now in region B and remain co-located.
4646
4747
![Scenario 1. Configuration after failover](./media/designing-cloud-solutions-for-disaster-recovery/scenario1-b.png)
4848

49-
If an outage happens in region B, the replication process between the primary and the secondary database gets suspended but the link between the two remains intact (1). Traffic managed detects that connectivity to Region B is broken and marks the endpoint web app 2 as Degraded (2). The application's performance is not impacted in this case, but the database becomes exposed and therefore at higher risk of data loss in case region A fails in succession.
49+
If an outage happens in region B, the replication process between the primary and the secondary database gets suspended but the link between the two remains intact (1). Traffic Manager detects that connectivity to Region B is broken and marks the endpoint web app 2 as Degraded (2). The application's performance is not impacted in this case, but the database becomes exposed and therefore at higher risk of data loss in case region A fails in succession.
5050

5151
> [!NOTE]
5252
> For disaster recovery, we recommend the configuration with application deployment limited to two regions. This is because most of the Azure geographies have only two regions. This configuration does not protect your application from a simultaneous catastrophic failure of both regions. In an unlikely event of such a failure, you can recover your databases in a third region using [geo-restore operation](disaster-recovery-guidance.md#recover-using-geo-restore).
@@ -70,20 +70,20 @@ This option is best suited for applications with the following characteristics:
7070
* Any data loss is high business risk. The database failover can only be used as a last resort if the outage is caused by a catastrophic failure.
7171
* The application supports read-only and read-write modes of operations and can operate in "read-only mode" for a period of time.
7272

73-
In this pattern, the application switches to read-only mode when the read-write connections start getting time-out errors. The Web Application is deployed to both regions and include a connection to the read-write listener endpoint and different connection to the read-only listener endpoint (1). The Traffic manager profile should use [priority routing](../../traffic-manager/traffic-manager-configure-priority-routing-method.md). [End point monitoring](../../traffic-manager/traffic-manager-monitoring.md) should be enabled for the application endpoint in each region (2).
73+
In this pattern, the application switches to read-only mode when the read-write connections start getting time-out errors. The web application is deployed to both regions and includes a connection to the read-write listener endpoint and different connection to the read-only listener endpoint (1). The Traffic Manager profile should use [priority routing](../../traffic-manager/traffic-manager-configure-priority-routing-method.md). [End point monitoring](../../traffic-manager/traffic-manager-monitoring.md) should be enabled for the application endpoint in each region (2).
7474

7575
The following diagram illustrates this configuration before an outage:
7676

7777
![Scenario 2. Configuration before the outage.](./media/designing-cloud-solutions-for-disaster-recovery/scenario2-a.png)
7878

79-
When the traffic manager detects a connectivity failure to region A, it automatically switches user traffic to the application instance in region B. With this pattern, it is important that you set the grace period with data loss to a sufficiently high value, for example 24 hours. It ensures that data loss is prevented if the outage is mitigated within that time. When the Web application in region B is activated the read-write operations start failing. At that point, it should switch to the read-only mode (1). In this mode the requests are automatically routed to the secondary database. If the outage is caused by a catastrophic failure, most likely it cannot be mitigated within the grace period. When it expires the failover group triggers the failover. After that the read-write listener becomes available and the connections to it stop failing (2). The following diagram illustrates the two stages of the recovery process.
79+
When Traffic Manager detects a connectivity failure to region A, it automatically switches user traffic to the application instance in region B. With this pattern, it is important that you set the grace period with data loss to a sufficiently high value, for example 24 hours. It ensures that data loss is prevented if the outage is mitigated within that time. When the web application in region B is activated the read-write operations start failing. At that point, it should switch to the read-only mode (1). In this mode the requests are automatically routed to the secondary database. If the outage is caused by a catastrophic failure, most likely it cannot be mitigated within the grace period. When it expires the failover group triggers the failover. After that the read-write listener becomes available and the connections to it stop failing (2). The following diagram illustrates the two stages of the recovery process.
8080

8181
> [!NOTE]
82-
> If the outage in the primary region is mitigated within the grace period, traffic manager detects the restoration of connectivity in the primary region and switches user traffic back to the application instance in region A. That application instance resumes and operates in read-write mode using the primary database in region A as illustrated by the previous diagram.
82+
> If the outage in the primary region is mitigated within the grace period, Traffic Manager detects the restoration of connectivity in the primary region and switches user traffic back to the application instance in region A. That application instance resumes and operates in read-write mode using the primary database in region A as illustrated by the previous diagram.
8383
8484
![Scenario 2. Disaster recovery stages.](./media/designing-cloud-solutions-for-disaster-recovery/scenario2-b.png)
8585

86-
If an outage happens in region B, the traffic manager detects the failure of the end point web-app-2 in region B and marks it degraded (1). In the meantime, the failover group switches the read-only listener to region A (2). This outage does not impact the end user experience but the primary database is exposed during the outage. The following diagram illustrates a failure in the secondary region:
86+
If an outage happens in region B, Traffic Manager detects the failure of the end point web-app-2 in region B and marks it degraded (1). In the meantime, the failover group switches the read-only listener to region A (2). This outage does not impact the end user experience but the primary database is exposed during the outage. The following diagram illustrates a failure in the secondary region:
8787

8888
![Scenario 2. Outage of the secondary region.](./media/designing-cloud-solutions-for-disaster-recovery/scenario2-c.png)
8989

@@ -92,7 +92,7 @@ Once the outage is mitigated, the secondary database is immediately synchronized
9292
This design pattern has several **advantages**:
9393

9494
* It avoids data loss during the temporary outages.
95-
* Downtime depends only on how quickly traffic manager detects the connectivity failure, which is configurable.
95+
* Downtime depends only on how quickly Traffic Manager detects the connectivity failure, which is configurable.
9696

9797
The **tradeoff** is that the application must be able to operate in read-only mode.
9898

@@ -107,7 +107,7 @@ In this scenario the application has the following characteristics:
107107

108108
In order to meet these requirements you need to guarantee that the user device **always** connects to the application deployed in the same geography for the read-only operations, such as browsing data, analytics etc. Whereas, the OLTP operations are processed in the same geography **most of the time**. For example, during the day time OLTP operations are processed in the same geography, but during the off hours they could be processed in a different geography. If the end user activity mostly happens during the working hours, you can guarantee the optimal performance for most of the users most of the time. The following diagram shows this topology.
109109

110-
The application’s resources should be deployed in each geography where you have substantial usage demand. For example, if your application is actively used in the United States, European Union and South East Asia the application should be deployed to all of these geographies. The primary database should be dynamically switched from one geography to the next at the end of the working hours. This method is called “follow the sun”. The OLTP workload always connects to the database via the read-write listener **<failover-group-name>.database.windows.net** (1). The read-only workload connects to the local database directly using the databases server endpoint **<server-name>.database.windows.net** (2). Traffic manager is configured with the [performance routing method](../../traffic-manager/traffic-manager-configure-performance-routing-method.md). It ensures that the end user’s device is connected to the web service in the closest region. Traffic manager should be set up with end point monitoring enabled for each web service end point (3).
110+
The application’s resources should be deployed in each geography where you have substantial usage demand. For example, if your application is actively used in the United States, European Union and South East Asia the application should be deployed to all of these geographies. The primary database should be dynamically switched from one geography to the next at the end of the working hours. This method is called “follow the sun”. The OLTP workload always connects to the database via the read-write listener **<failover-group-name>.database.windows.net** (1). The read-only workload connects to the local database directly using the databases server endpoint **<server-name>.database.windows.net** (2). Traffic Manager is configured with the [performance routing method](../../traffic-manager/traffic-manager-configure-performance-routing-method.md). It ensures that the end user’s device is connected to the web service in the closest region. Traffic Manager should be set up with end point monitoring enabled for each web service end point (3).
111111

112112
> [!NOTE]
113113
> The failover group configuration defines which region is used for failover. Because the new primary is in a different geography the failover results in longer latency for both OLTP and read-only workloads until the impacted region is back online.
@@ -140,7 +140,7 @@ The key **benefits** of this design are:
140140

141141
But there are some **tradeoffs**:
142142

143-
* A regional outage results in the geography to be impacted by longer latency. Both read-write and read-only workloads is served by the application in a different geography.
143+
* A regional outage results in the geography to be impacted by longer latency. Both read-write and read-only workloads are served by the application in a different geography.
144144
* The read-only workloads must connect to a different end point in each region.
145145

146146
## Business continuity planning: Choose an application design for cloud disaster recovery

0 commit comments

Comments
 (0)