Skip to content

Commit 069dee2

Browse files
authored
Merge pull request #216190 from spelluru/sbuszone1027
Updates to Geo-DR article
2 parents 26ef934 + dc3f1db commit 069dee2

File tree

5 files changed

+26
-27
lines changed

5 files changed

+26
-27
lines changed
-13.7 KB
Loading
-1.6 KB
Loading
-21.6 KB
Loading
-19.8 KB
Loading

articles/service-bus-messaging/service-bus-geo-dr.md

Lines changed: 26 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -2,26 +2,26 @@
22
title: Azure Service Bus Geo-disaster recovery | Microsoft Docs
33
description: How to use geographical regions to fail over and disaster recovery in Azure Service Bus
44
ms.topic: article
5-
ms.date: 04/01/2022
5+
ms.date: 10/27/2022
66
---
77

88
# Azure Service Bus Geo-disaster recovery
99

1010
Resilience against disastrous outages of data processing resources is a requirement for many enterprises and in some cases even required by industry regulations.
1111

12-
Azure Service Bus already spreads the risk of catastrophic failures of individual machines or even complete racks across clusters that span multiple failure domains within a datacenter and it implements transparent failure detection and failover mechanisms such that the service will continue to operate within the assured service-levels and typically without noticeable interruptions when such failures occur. If a Service Bus namespace has been created with the enabled option for [availability zones](../availability-zones/az-overview.md), the outage risk is further spread across three physically separated facilities, and the service has enough capacity reserves to instantly cope with the complete, catastrophic loss of the entire facility.
12+
Azure Service Bus already spreads the risk of catastrophic failures of individual machines or even complete racks across clusters that span multiple failure domains within a datacenter and it implements transparent failure detection and failover mechanisms such that the service will continue to operate within the assured service-levels and typically without noticeable interruptions when such failures occur. A premium namespace can have two or more messaging units and these messaging units will be spread across multiple failure domains within a datacenter, supporting an all-active Service Bus cluster model.
1313

14-
The all-active Azure Service Bus cluster model with availability zone support is superior to any on-premises message broker product in terms of resiliency against grave hardware failures and even catastrophic loss of entire datacenter facilities. Still, there might be grave situations with widespread physical destruction that even those measures can't sufficiently defend against.
14+
For a premium tier namespace, the outage risk is further spread across three physically separated facilities ([availability zones](#availability-zones)), and the service has enough capacity reserves to instantly cope with the complete, catastrophic loss of a datacenter. The all-active Azure Service Bus cluster model within a failure domain along with the availability zone support is superior to any on-premises message broker product in terms of resiliency against grave hardware failures and even catastrophic loss of entire datacenter facilities. Still, there might be grave situations with widespread physical destruction that even those measures can't sufficiently defend against.
1515

1616
The Service Bus Geo-disaster recovery feature is designed to make it easier to recover from a disaster of this magnitude and abandon a failed Azure region for good and without having to change your application configurations. Abandoning an Azure region will typically involve several services and this feature primarily aims at helping to preserve the integrity of the composite application configuration. The feature is globally available for the Service Bus Premium SKU.
1717

18-
The Geo-Disaster recovery feature ensures that the entire configuration of a namespace (Queues, Topics, Subscriptions, Filters) is continuously replicated from a primary namespace to a secondary namespace when paired, and it allows you to initiate a once-only failover move from the primary to the secondary at any time. The failover move will repoint the chosen alias name for the namespace to the secondary namespace and then break the pairing. The failover is nearly instantaneous once initiated.
18+
The Geo-Disaster recovery feature ensures that the entire configuration of a namespace (queues, topics, subscriptions, filters) is continuously replicated from a primary namespace to a secondary namespace when paired, and it allows you to initiate a once-only failover move from the primary to the secondary at any time. The failover move will repoint the chosen alias name for the namespace to the secondary namespace and then break the pairing. The failover is nearly instantaneous once initiated.
1919

2020
## Important points to consider
2121

22-
- The feature enables instant continuity of operations with the same configuration, but **doesn't replicate the messages held in queues or topic subscriptions or dead-letter queues**. To preserve queue semantics, such a replication will require not only the replication of message data, but of every state change in the broker. For most Service Bus namespaces, the required replication traffic would far exceed the application traffic and with high-throughput queues, most messages would still replicate to the secondary while they are already being deleted from the primary, causing excessively wasteful traffic. For high-latency replication routes, which applies to many pairings you would choose for Geo-disaster recovery, it might also be impossible for the replication traffic to sustainably keep up with the application traffic due to latency-induced throttling effects.
22+
- The feature enables instant continuity of operations with the same configuration, but **doesn't replicate the messages held in queues or topic subscriptions or dead-letter queues**. To preserve queue semantics, such a replication will require not only the replication of message data, but of every state change in the broker. For most Service Bus namespaces, the required replication traffic would far exceed the application traffic and with high-throughput queues, most messages would still replicate to the secondary while they're already being deleted from the primary, causing excessively wasteful traffic. For high-latency replication routes, which applies to many pairings you would choose for Geo-disaster recovery, it might also be impossible for the replication traffic to sustainably keep up with the application traffic due to latency-induced throttling effects.
2323
- Azure Active Directory (Azure AD) role-based access control (RBAC) assignments to Service Bus entities in the primary namespace aren't replicated to the secondary namespace. Create role assignments manually in the secondary namespace to secure access to them.
24-
- The following configurations are not replicated.
24+
- The following configurations aren't replicated.
2525
- Virtual network configurations
2626
- Private endpoint connections
2727
- All networks access enabled
@@ -62,25 +62,25 @@ The following terms are used in this article:
6262

6363
The following section is an overview to set up pairing between the namespaces.
6464

65-
![1][]
65+
:::image type="content" source="./media/service-bus-geo-dr/geodr_setup_pairing.png" alt-text="Image showing how geo-disaster recovery works.":::
6666

6767
You first create or use an existing primary namespace, and a new secondary namespace, then pair the two. This pairing gives you an alias that you can use to connect. Because you use an alias, you don't have to change connection strings. Only new namespaces can be added to your failover pairing.
6868

69-
1. Create the primary namespace.
70-
1. Create the secondary namespace in a different region. This step is optional. You can create the secondary namespace while creating the pairing in the next step.
69+
1. Create the primary premium-tier namespace.
70+
1. Create the secondary premium-tier namespace in a different region. This step is optional. You can create the secondary namespace while creating the pairing in the next step.
7171
1. In the Azure portal, navigate to your primary namespace.
7272
1. Select **Geo-recovery** on the left menu, and select **Initiate pairing** on the toolbar.
7373

74-
:::image type="content" source="./media/service-bus-geo-dr/primary-namspace-initiate-pairing-button.png" alt-text="Initiate pairing from the primary namespace":::
74+
:::image type="content" source="./media/service-bus-geo-dr/primary-namspace-initiate-pairing-button.png" alt-text="Screenshot showing the Geo-recovery page with Initiate pairing link selected.":::
7575
1. On the **Initiate pairing** page, follow these steps:
7676
1. Select an existing secondary namespace or create one in a different region. In this example, an existing namespace is used as the secondary namespace.
7777
1. For **Alias**, enter an alias for the geo-dr pairing.
7878
1. Then, select **Create**.
7979

80-
:::image type="content" source="./media/service-bus-geo-dr/initiate-pairing-page.png" alt-text="Select the secondary namespace":::
80+
:::image type="content" source="./media/service-bus-geo-dr/initiate-pairing-page.png" alt-text="Screenshot showing the Initiate Pairing page in the Azure portal.":::
8181
1. You should see the **Service Bus Geo-DR Alias** page as shown in the following image. You can also navigate to the **Geo-DR Alias** page from the primary namespace page by selecting the **Geo-recovery** on the left menu.
8282

83-
:::image type="content" source="./media/service-bus-geo-dr/service-bus-geo-dr-alias-page.png" alt-text="Service Bus Geo-DR Alias page":::
83+
:::image type="content" source="./media/service-bus-geo-dr/service-bus-geo-dr-alias-page.png" alt-text="Screenshot showing the Service Bus Geo-DR Alias page with primary and secondary namespaces.":::
8484
1. On the **Geo-DR Alias** page, select **Shared access policies** on the left menu to access the primary connection string for the alias. Use this connection string instead of using the connection string to the primary/secondary namespace directly. Initially, the alias points to the primary namespace.
8585
1. Switch to the **Overview** page. You can do the following actions:
8686
1. Break the pairing between primary and secondary namespaces. Select **Break pairing** on the toolbar.
@@ -90,28 +90,28 @@ You first create or use an existing primary namespace, and a new secondary names
9090
1. Turn ON the **Safe Failover** option to safely fail over to the secondary namespace. This feature makes sure that pending Geo-DR replications are completed before switching over to the secondary.
9191
1. Then, select **Failover**.
9292

93-
:::image type="content" source="./media/service-bus-geo-dr/failover-page.png" alt-text="{alt-text}":::
93+
:::image type="content" source="./media/service-bus-geo-dr/failover-page.png" alt-text="Screenshot showing the Failover page.":::
9494

9595
> [!IMPORTANT]
9696
> Failing over will activate the secondary namespace and remove the primary namespace from the Geo-Disaster Recovery pairing. Create another namespace to have a new geo-disaster recovery pair.
9797
9898
1. Finally, you should add some monitoring to detect if a failover is necessary. In most cases, the service is one part of a large ecosystem, thus automatic failovers are rarely possible, as often failovers must be performed in sync with the remaining subsystem or infrastructure.
9999

100100
### Service Bus standard to premium
101-
If you have [migrated your Azure Service Bus Standard namespace to Azure Service Bus Premium](service-bus-migrate-standard-premium.md), then you must use the pre-existing alias (that is, your Service Bus Standard namespace connection string) to create the disaster recovery configuration through the **PS/CLI** or **REST API**.
101+
If you've [migrated your Azure Service Bus Standard namespace to Azure Service Bus Premium](service-bus-migrate-standard-premium.md), then you must use the pre-existing alias (that is, your Service Bus Standard namespace connection string) to create the disaster recovery configuration through the **PS/CLI** or **REST API**.
102102

103-
It's because, during migration, your Azure Service Bus Standard namespace connection string/DNS name itself becomes an alias to your Azure Service Bus Premium namespace.
103+
It's because, during migration, your Azure Service Bus standard namespace connection string/DNS name itself becomes an alias to your Azure Service Bus premium namespace.
104104

105-
Your client applications must utilize this alias (that is, the Azure Service Bus Standard namespace connection string) to connect to the Premium namespace where the disaster recovery pairing has been set up.
105+
Your client applications must utilize this alias (that is, the Azure Service Bus standard namespace connection string) to connect to the premium namespace where the disaster recovery pairing has been set up.
106106

107-
If you use the Portal to set up the Disaster recovery configuration, then the portal will abstract this caveat from you.
107+
If you use the Azure portal to set up the disaster recovery configuration, the portal will abstract this caveat from you.
108108

109109

110110
## Failover flow
111111

112112
A failover is triggered manually by the customer (either explicitly through a command, or through client owned business logic that triggers the command) and never by Azure. It gives the customer full ownership and visibility for outage resolution on Azure's backbone.
113113

114-
![4][]
114+
:::image type="content" source="./media/service-bus-geo-dr/geodr_failover_alias_update.png" alt-text="Image showing the flow of failover from primary to secondary namespace.":::
115115

116116
After the failover is triggered -
117117

@@ -132,11 +132,11 @@ Once the failover is initiated -
132132
133133
You can automate failover either with monitoring systems, or with custom-built monitoring solutions. However, such automation takes extra planning and work, which is out of the scope of this article.
134134

135-
![2][]
135+
:::image type="content" source="./media/service-bus-geo-dr/geo2.png" alt-text="Image showing how you can automate failover.":::
136136

137137
## Management
138138

139-
If you made a mistake; for example, you paired the wrong regions during the initial setup, you can break the pairing of the two namespaces at any time. If you want to use the paired namespaces as regular namespaces, delete the alias.
139+
If you made a mistake, for example, you paired the wrong regions during the initial setup, you can break the pairing of the two namespaces at any time. If you want to use the paired namespaces as regular namespaces, delete the alias.
140140

141141
## Use existing namespace as alias
142142

@@ -156,26 +156,25 @@ The [samples on GitHub](https://github.com/Azure/azure-service-bus/tree/master/s
156156

157157
Note the following considerations to keep in mind with this release:
158158

159-
1. In your failover planning, you should also consider the time factor. For example, if you lose connectivity for longer than 15 to 20 minutes, you might decide to initiate the failover.
159+
- In your failover planning, you should also consider the time factor. For example, if you lose connectivity for longer than 15 to 20 minutes, you might decide to initiate the failover.
160160

161-
2. The fact that no data is replicated means that currently active sessions aren't replicated. Additionally, duplicate detection and scheduled messages may not work. New sessions, new scheduled messages, and new duplicates will work.
161+
- The fact that no data is replicated means that currently active sessions aren't replicated. Additionally, duplicate detection and scheduled messages may not work. New sessions, new scheduled messages, and new duplicates will work.
162162

163-
3. Failing over a complex distributed infrastructure should be [rehearsed](/azure/architecture/reliability/disaster-recovery#disaster-recovery-plan) at least once.
163+
- Failing over a complex distributed infrastructure should be [rehearsed](/azure/architecture/reliability/disaster-recovery#disaster-recovery-plan) at least once.
164164

165-
4. Synchronizing entities can take some time, approximately 50-100 entities per minute. Subscriptions and rules also count as entities.
165+
- Synchronizing entities can take some time, approximately 50-100 entities per minute. Subscriptions and rules also count as entities.
166166

167167
## Availability Zones
168168

169-
The Service Bus Premium SKU supports [Availability Zones](../availability-zones/az-overview.md), providing fault-isolated locations within the same Azure region. Service Bus manages three copies of messaging store (1 primary and 2 secondary). Service Bus keeps all the three copies in sync for data and management operations. If the primary copy fails, one of the secondary copies is promoted to primary with no perceived downtime. If the applications see transient disconnects from Service Bus, the retry logic in the SDK will automatically reconnect to Service Bus.
169+
The Service Bus Premium SKU supports [availability zones](../availability-zones/az-overview.md), providing fault-isolated locations within the same Azure region. Service Bus manages three copies of the messaging store (1 primary and 2 secondary). Service Bus keeps all three copies in sync for data and management operations. If the primary copy fails, one of the secondary copies is promoted to primary with no perceived downtime. If the applications see transient disconnects from Service Bus, the retry logic in the SDK will automatically reconnect to Service Bus.
170170

171171
When you use availability zones, both metadata and data (messages) are replicated across data centers in the availability zone.
172172

173173
> [!NOTE]
174174
> The Availability Zones support for Azure Service Bus Premium is only available in [Azure regions](../availability-zones/az-region.md) where availability zones are present.
175175
176-
You can enable Availability Zones on new namespaces only, using the Azure portal. Service Bus does not support migration of existing namespaces. You cannot disable zone redundancy after enabling it on your namespace.
176+
When you create a premium tier namespace, the support for availability zones (if available in the selected region) is automatically enabled for the namespace. There's no additional cost for using this feature and you can't disable or enable this feature.
177177

178-
![3][]
179178

180179
## Private endpoints
181180
This section provides more considerations when using Geo-disaster recovery with namespaces that use private endpoints. To learn about using private endpoints with Service Bus in general, see [Integrate Azure Service Bus with Azure Private Link](private-link-service.md).

0 commit comments

Comments
 (0)