Skip to content

Commit 8d3ac90

Browse files
committed
adding more details and fixing errors
1 parent bf3d849 commit 8d3ac90

File tree

2 files changed

+37
-19
lines changed

2 files changed

+37
-19
lines changed

articles/event-hubs/event-hubs-business-continuity-outages-disasters.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,13 @@ The Event Hubs Geo-Disaster Recovery and Geo-Replication features are designed t
2020

2121
## Definitions
2222

23-
It’s important to distinguish between the different scenarios where business continuity and disater recovery features may be used:
23+
It’s important to distinguish between the different scenarios where business continuity and disaster recovery features may be used:
2424

25-
- **Planned Maintenance :** A customer planned event where resources in the specific region are optimized to meet business goals. In these events, workflows may be adjusted to use a secondary region while the primary region is being optimized. For e.g. Blue-green deployments, database backups and recovery, data integrity checks.
25+
- **Planned Maintenance :** A customer planned event where resources in the specific region are optimized to meet business goals. In these events, workflows may be adjusted to use a secondary region while the primary region is being optimized. For example, Blue-green deployments, database backups and recovery, data integrity checks.
2626

2727
- **Outage:** A temporary unavailability of Event Hubs, which could affect individual partitions, the messaging store, or even the entire datacenter. Outages are typically resolved without data loss, and the service resumes normal operation once the underlying issue is fixed. Examples include hardware failures, software bugs, or short-term network issues.
2828

29-
- **Disaster:** The permanent or prolonged loss of an Event Hubs cluster, region, or datacenterThe region or datacenter might or might not become available again, or might be down for hours or days. Examples of such disasters are fire, flooding, or earthquake. A disaster that becomes permanent might cause the loss of some messages, events, or other data. However, in most cases there should be no data loss and messages can be recovered once the data center comes back up.
29+
- **Disaster:** The permanent or prolonged loss of an Event Hubs cluster, region, or datacenter. The region or datacenter may or may not become available again, or might be down for hours or days. Examples of such disasters are fire, flooding, or earthquake. While this is unlikely, a disaster that becomes permanent might cause the loss of some messages, events, or other data. However, in most cases there should be no data loss and messages can be recovered once the data center comes back up.
3030

3131

3232
## Protection Against Outages and Disasters
@@ -42,13 +42,13 @@ Event Hubs supports **availability zones** in select Azure regions. Data (metada
4242
4343
### Geo-Disaster Recovery (Geo-DR)
4444

45-
Event Hubs supports [Geo-Disaster Recovery (Geo-DR)](event-hubs-geo-dr.md) at the namespace level, which implements metadata disaster recovery between the primary and secondary namespace in different Azure regions. With Geo-disaster recovery, **only metadata** for entites is replicated between primary and secondary namespaces.
45+
Event Hubs supports [Geo-Disaster Recovery (Geo-DR)](event-hubs-geo-dr.md) at the namespace level, which implements metadata disaster recovery between the primary and secondary namespace in different Azure regions. With Geo-disaster recovery, **only metadata** for entities is replicated between primary and secondary namespaces.
4646

4747
### Geo-replication
4848

4949
Geo-replication ensures that metadata and data of a namespace is continuously replicated from a primary region to the secondary region. The namespace can be thought of as being virtually extended to more than one region, with one region being the primary and the other being the secondary.
5050

51-
At any time, the secondary region can be promoted to become a primary region. Promoting a secondary repoints the namespace to the selected secondary region, and the previous primary region is demoted to a secondary region.
51+
At any time, the secondary region can be promoted to become a primary region. Promoting a secondary repoints the namespace FQDN to the selected secondary region, and the previous primary region is demoted to a secondary region.
5252

5353
#### How does Geo-replication differ from Availability Zones
5454

articles/event-hubs/geo-replication.md

Lines changed: 32 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The Event Hubs Geo-replication feature provides replication of both metadata (en
1818
1919
This feature ensures that metadata and data of a namespace is continuously replicated from a primary region to the secondary region. The namespace can be thought of as being virtually extended to more than one region, with one region being the primary and the other being the secondary.
2020

21-
At any time, the secondary region can be promoted to become a primary region. Promoting a secondary repoints the namespace to the selected secondary region, and the previous primary region is demoted to a secondary region.
21+
At any time, the secondary region can be promoted to become a primary region. Promoting a secondary repoints the namespace FQDN (fully qualified domain name) to the selected secondary region, and the previous primary region is demoted to a secondary region.
2222

2323
## Scenarios
2424

@@ -38,7 +38,7 @@ Geo-replication can also be used to facilitate data migration, maintenance, and
3838

3939
## Basic concepts
4040

41-
The Geo-Replication feature implements metadata and data replication in a primary-secondary replication model. At a given time there’s a single primary region, which is serving both producers and consumers. The secondaries act as hot stand-by regions, meaning that it isn't possible to interact with these secondary regions. However, they run in the same configuration as the primary region, allowing for fast promotion, and meaning they your workloads can immediately continue running after promotion has been completed.
41+
The Geo-Replication feature implements metadata and data replication in a primary-secondary replication model. At a given time there’s a single primary region, which is serving both producers and consumers. The secondary acts as a hot stand-by region, meaning that it isn't possible to interact with these secondary regions. However, they run in the same configuration as the primary region, allowing for fast promotion, ready to step in after promotion has been completed.
4242

4343
Some of the key aspects of Geo-data Replication feature are:
4444

@@ -61,7 +61,7 @@ There are two replication consistency configurations, synchronous and asynchrono
6161

6262
### Asynchronous replication
6363

64-
Using asynchronous replication, all requests are committed on the primary, after which an acknowledgment is sent to the client. Replication to the secondary regions happens asynchronously. Users can configure the maximum acceptable amount of lag time. The lag time is the service side offset between the latest action on the primary and the secondary regions. The service continuously replicates the data and metadata, ensuring the lag remains as small as possible. If the lag for an active secondary grows beyond the user configured maximum replication lag, the primary starts throttling incoming requests.
64+
Using asynchronous replication, all requests are committed on the primary, after which an acknowledgment is sent to the client. Replication to the secondary regions happens asynchronously. Users can configure the maximum acceptable amount of lag time - the service side offset between the latest action on the primary and the secondary regions. The service continuously replicates the data and metadata, ensuring the lag remains as small as possible. If the lag for an active secondary grows beyond the user configured maximum replication lag, the primary starts throttling incoming requests.
6565

6666
### Synchronous replication
6767

@@ -99,6 +99,10 @@ The replication mode can be changed after configuring Geo-Replication. You can g
9999
## Secondary region selection
100100
To enable the Geo-Replication feature, you need to use primary and secondary regions where the feature is enabled. The Geo-Replication feature depends on being able to replicate published messages from the primary to the secondary regions. If the secondary region is on another continent, this has a major impact on replication lag from the primary to the secondary region. If using Geo-Replication for availability reasons, you're best off with secondary regions being at least on the same continent where possible. To get a better understanding of the latency induced by geographic distance, you can learn more from Azure network round-trip latency statistics.
101101

102+
> [!NOTE]
103+
> Geo-replication requires that primary and secondary copies of the Event Hubs be on the same tier. The configuration cannot be done across tiers.
104+
>
105+
102106
## Geo-replication management
103107

104108
The Geo-Replication feature enables customers to configure a secondary region towards which to replicate metadata and data. As such, customers can perform the following management tasks:
@@ -108,6 +112,20 @@ The Geo-Replication feature enables customers to configure a secondary region to
108112
- **Trigger promotion/failover** - All promotions are customer initiated.
109113
- **Remove a secondary** - If at any time you want to remove a secondary region, you can do so after which the data in the secondary region is deleted.
110114

115+
### Criteria to trigger promotion
116+
117+
Here are some cases where a promotion of a secondary to primary may be triggered.
118+
119+
* Regional Outage: If there is a regional outage affecting the primary region, you should promote the secondary region to ensure business continuity and minimize downtime.
120+
121+
* Maintenance Activities: During planned maintenance activities in the primary region, promoting the secondary region can help maintain high availability for mission-critical applications.
122+
123+
* Disaster Recovery: In the event of a disaster affecting the primary region, promoting the secondary region ensures that your data remains accessible and your applications continue to function.
124+
125+
* Performance Issues: If the primary region is experiencing performance issues that impact the availability or reliability of your Event Hubs, promoting the secondary region can help mitigate these issues.
126+
127+
It is recommended to occasionally test failover mechanisms to ensure the business continuity plan is effeective, and your applications can seamlessly switch to the secondary region when needed.
128+
111129
## Monitoring data replication
112130
Users can monitor the progress of the replication job by monitoring the replication lag metric in Application Metrics logs.
113131

@@ -142,8 +160,8 @@ Consuming applications can consume data using the namespace hostname of a namesp
142160
Event consuming applications can continue to maintain offset management as they would do it with a non-geo replicated namespace. No special consideration is needed for offset management for geo-replication enabled namespaces.
143161

144162
> [!WARNING]
145-
> In the event of forced failover (i.e. non graceful failover), some of the data that hasn't been copied over may be lost. This may cause the offsets of that specific data to be different across the primary and secondary regions for the namespace, however it would still be within the bounds of the maximum replication lag configured for the namespace.
146-
> In such cases, it is preferred to start consuming from the last committed offset. Some data might have duplicate processing and must be handled on the client side.
163+
> In the event of forced failover (that is, non graceful failover), some of the data that is yet to be copied over may be lost. This may cause the offsets of that specific data to be different across the primary and secondary regions for the namespace, however it would still be within the bounds of the maximum replication lag configured for the namespace.
164+
> In such cases, it's preferred to start consuming from the last committed offset. Some data might have duplicate processing and must be handled on the client side.
147165
>
148166
149167
#### Kafka
@@ -157,7 +175,7 @@ Here are the list of Apache Kafka clients that are supported -
157175
| Apache Kafka | 2.1.0 or later |
158176
| Librdkafka and derived libraries | 2.1.0 or later |
159177

160-
In the case of other libraries, these are supported based on the versioning of the specific definitions -
178+
In the case of other libraries, the ones using the below API versions are supported -
161179

162180
| API name | Version supported |
163181
| -------------- | ----------------- |
@@ -170,7 +188,7 @@ In the case of other libraries, these are supported based on the versioning of t
170188

171189
#### Event Hubs SDK/AMQP
172190

173-
In the case of AMQP, the checkpoint is managed by users with a checkpoint store such as Azure Blob storage or a custom storage solution. If there's a failover, the checkpoint store must be available from the secondary region so that clients can retrieve checkpoint data and avoid loss of messages.
191+
For AMQP, the checkpoint is managed by users with a checkpoint store such as Azure Blob storage or a custom storage solution. If there's a failover, the checkpoint store must be available from the secondary region so that clients can retrieve checkpoint data and avoid loss of messages.
174192

175193
The latest version of the Event Hubs SDK has made some changes to checkpoint representation to supports failovers. We recommend using the [latest versions of the SDKs](sdks.md), but prior versions of the below SDKs are supported as well.
176194

@@ -182,9 +200,9 @@ The latest version of the Event Hubs SDK has made some changes to checkpoint rep
182200
> [!WARNING]
183201
> As part of the implementation, the checkpoint format is adapted when geo-replication is enabled on a namespace. Subsequent checkpoints after the geo-replication is complete will be written with a new format. If you force promote a secondary region to primary right after the geo-replication pairing is done but before a new checkpoint is stored (this may happen in the case of forced promotion/failover), then a new data published post promotion may be lost.
184202
>
185-
> In such cases, it is preferred to start consuming from the last committed offset. Some data might have duplicate processing and must be handled on the client side.
203+
> In such cases, it's preferred to start consuming from the last committed offset. Some data might have duplicate processing and must be handled on the client side.
186204
>
187-
> It is also recommended to upgrade to the [latest versions of the SDKs](sdks.md).
205+
> It's also recommended to upgrade to the [latest versions of the SDKs](sdks.md).
188206
>
189207
190208
## Considerations
@@ -202,7 +220,7 @@ The pricing varies based on the tier you pick, but generally has 2 parameters -
202220
* The bandwidth charge for the data being replicated between the primary and secondary regions.
203221

204222
> [!NOTE]
205-
> Please refer to the pricing details listed at [Azure Event Hubs](https://azure.microsoft.com/products/event-hubs/) to determine the charges. The geo-replication charge depends on location of the primary region.
223+
> Refer to the pricing details listed at [Azure Event Hubs](https://azure.microsoft.com/products/event-hubs/) to determine the charges. The geo-replication charge depends on location of the primary region.
206224
>
207225
208226
### Dedicated clusters
@@ -213,9 +231,9 @@ When geo-replication is enabled, the only additional charge is the bandwidth cha
213231

214232
### Premium namespaces
215233

216-
For Premium namespaces, enabling geo-replication provisions the same number of processing units (PUs) in the secondary region. Thus, you pay for the **number of PUs** you are using and the **bandwidth for the data transferred between the primary and secondary region**.
234+
For Premium namespaces, enabling geo-replication provisions the same number of processing units (PUs) in the secondary region. Thus, you pay for the **number of PUs** you're using and the **bandwidth for the data transferred between the primary and secondary region**.
217235

218-
For example, if you enable geo-replication on a Premium namespace which has been provisioned with **4 PU**, you will be billed for
236+
For example, if you enable geo-replication on a Premium namespace which has been provisioned with **4 PU**, you'll be billed for
219237

220238
* 4 PUs in the primary region,
221239
* 4 PUs in the secondary region,
@@ -227,12 +245,12 @@ Bandwidth is charged based on the data transferred between the primary and secon
227245

228246
This section provides additional considerations when using Geo-Replication with namespaces that utilize private endpoints. For general information on using private endpoints with Event Hubs, see [Integrate Azure Event Hubs with Azure Private Link](private-link-service.md).
229247

230-
When implementing Geo-Replication for a Event Hubs namespace that uses private endpoints, it is important to create private endpoints for both the primary and secondary regions. These endpoints should be configured against virtual networks hosting both primary and secondary instances of your application. For example, if you have two virtual networks, VNET-1 and VNET-2, you need to create two private endpoints on the Event Hubs namespace, using subnets from VNET-1 and VNET-2 respectively. Moreover, the VNETs should be set up with [cross-region peering](/azure/virtual-network/virtual-network-peering-overview), so that clients can communicate with either of the private endpoints. Finally, the [DNS](/azure/private-link/private-endpoint-dns) needs to be managed in such a way that all clients get the DNS information, which should point the namespace endpoint (namespacename.servicebus.windows.net) to the IP address of the private endpoint in the current primary region.
248+
When implementing Geo-Replication for an Event Hubs namespace that uses private endpoints, it is important to create private endpoints for both the primary and secondary regions. These endpoints should be configured against virtual networks hosting both primary and secondary instances of your application. For example, if you have two virtual networks, VNET-1 and VNET-2, you need to create two private endpoints on the Event Hubs namespace, using subnets from VNET-1 and VNET-2 respectively. Moreover, the VNETs should be set up with [cross-region peering](/azure/virtual-network/virtual-network-peering-overview), so that clients can communicate with either of the private endpoints. Finally, the [DNS](/azure/private-link/private-endpoint-dns) needs to be managed in such a way that all clients get the DNS information, which should point the namespace endpoint (namespacename.servicebus.windows.net) to the IP address of the private endpoint in the current primary region.
231249

232250
> [!IMPORTANT]
233251
> When promoting a secondary region for Event Hubs, the DNS entry also needs to be updated to point to the corresponding endpoint.
234252
235-
:::image type="content" source="./media/geo-replication/geo-replication-private-endpoints.png" alt-text="Screenshot showing two VNETs with their own private endpoints and VMs connected to an on-premises instance and a Event Hubs namespace.":::
253+
:::image type="content" source="./media/geo-replication/geo-replication-private-endpoints.png" alt-text="Screenshot showing two VNETs with their own private endpoints and VMs connected to an on-premises instance and an Event Hubs namespace.":::
236254

237255
The advantage of this approach is that failover can occur independently at the application layer or on the Event Hubs namespace:
238256

0 commit comments

Comments
 (0)