Skip to content

Commit b36e035

Browse files
committed
adding better structure. Some sections are still TBD
1 parent 81a4d26 commit b36e035

File tree

1 file changed

+77
-39
lines changed

1 file changed

+77
-39
lines changed

articles/event-hubs/geo-replication.md

Lines changed: 77 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -57,40 +57,20 @@ Organizations operating in multiple countries often need to comply with data sov
5757
### Migration and Upgrades
5858
Geo-replication can also be used to facilitate data migration, maintenance, and system upgrades. Organizations can migrate their namespace proactively from a primary to a secondary region to allow for any maintenance and upgrades on the primary region.
5959

60-
## Geo-replication
61-
The public preview of the Geo-replication feature is supported for namespaces in Event Hubs self-serve scaling dedicated clusters. You can use the feature with new, or existing namespaces in dedicated self-serve clusters. The following features aren't supported with Geo-replication:
62-
63-
- Customer managed keys encryption (CMK).
64-
- Managed identity for capture.
65-
- Private endpoints.
66-
- Large messages support (currently in public preview).
67-
- Kafka Streams and Transactions (currently in public preview).
68-
69-
Some of the key aspects of Geo-data Replication public preview are:
60+
## Basic concepts
61+
62+
The Geo-Replication feature implements metadata and data replication in a primary-secondary replication model. At a given time there’s a single primary region, which is serving both producers and consumers. The secondaries act as hot stand-by regions, meaning that it isn't possible to interact with these secondary regions. However, they run in the same configuration as the primary region, allowing for fast promotion, and meaning they your workloads can immediately continue running after promotion has been completed.
63+
64+
Some of the key aspects of Geo-data Replication feature are:
7065

7166
- Primary-secondary replication model – Geo-replication is built on primary-secondary replication model, where at a given time there’s only one primary namespace that serves event producers and event consumers.
7267
- Event Hubs performs fully managed byte-to-byte replication of metadata, event data, and consumer offset across secondaries with the configured consistency levels.
73-
- Stable namespace fully qualified domain name (FQDN) – The FQDN doesn't need to change when promotion is performed.
68+
- Single namespace hostname - Upon successful configuration of a Geo-Replication enabled namespace, users can use the namespace hostname in their client application. The hostname behaves agnostic of the configured primary and secondary regions, and always points to the primary region.
69+
- When a customer initiates a promotion, the hostname points to the region selected to be the new primary region. The old primary becomes a secondary region.
70+
- It isn't possible to read or write on the secondary regions.
71+
- Customer-managed promotion from primary to secondary region, providing full ownership and visibility for outage resolution. Metrics are available, which can help to automate the promotion from customer side.
72+
Secondary regions can be added or removed at the customer's discretion.
7473
- Replication consistency - There are two replication consistency settings, synchronous and asynchronous.
75-
- User-managed promotion of a secondary to being the new primary.
76-
77-
Changing a secondary to being a new primary is done two ways:
78-
79-
- **Planned**: a promotion of the secondary to primary where traffic isn't processed until the new primary catches up with all of the data held by the former primary instance.
80-
- **Forced**: as a failover where the secondary becomes primary as fast as possible. The Geo-replication feature replicates all data and metadata from the primary region to the selected secondary regions. The namespace FQDN always points to the primary region.
81-
82-
:::image type="content" source="./media/geo-replication/a-as-primary.png" alt-text="Diagram showing when region A is primary, B is secondary.":::
83-
84-
When you initiate a promotion of a secondary, the FQDN points to the region selected to be the new primary. The old primary then becomes a secondary. You can promote your secondary to be the new primary for reasons other than a failover. Those reasons can include application upgrades, failover testing, or any number of other things. In those situations, it's common to switch back when those activities are completed.
85-
86-
:::image type="content" source="./media/geo-replication/b-as-primary.png" alt-text="Diagram showing when B is made the primary, that A becomes the new secondary.":::
87-
88-
Secondary regions are added, or removed at the customer's discretion. There are some current limitations worth noting:
89-
90-
- There's no ability to support read-only views on secondary regions.
91-
- There's no automatic promotion/failover capability. All promotions are customer initiated.
92-
- Secondary regions must be different from the primary region. You can't select another dedicated cluster in the same region.
93-
- Only one secondary is supported for public preview.
9474

9575
## Replication modes
9676
There are two replication consistency configurations, synchronous and asynchronous. It's important to know the differences between the two configurations as they have an impact on your applications and your data consistency.
@@ -119,12 +99,12 @@ With **asynchronous** replication:
11999

120100
As such, it doesn’t have the absolute guarantee that all regions have the data before we commit it like synchronous replication does, and data loss or duplication may occur. However, as you're no longer immediately impacted when a single region lags or is unavailable, application availability improves, in addition to having a lower latency.
121101

122-
| Capability | Synchronous replication | Asynchronous replication |
123-
| --- | --- | --- |
124-
| Latency | Longer due to distributed commit operations | Minimally impacted |
125-
| Availability | Tied to availability of secondary regions | Loss of a secondary region doesn't immediately impact availability |
126-
| Data consistency | Data always committed in both regions before acknowledgment | Data committed in primary only before acknowledgment |
127-
| Recovery point objective (RPO) | RPO 0, no data loss on promotion | RPO > 0, possible data loss on promotion |
102+
| Capability | Synchronous replication | Asynchronous replication |
103+
|--------------------------------|--------------------------------------------------------------|--------------------------------------------------------------------|
104+
| Latency | Longer due to distributed commit operations | Minimally impacted |
105+
| Availability | Tied to availability of secondary regions | Loss of a secondary region doesn't immediately impact availability |
106+
| Data consistency | Data always committed in both regions before acknowledgment | Data committed in primary only before acknowledgment |
107+
| RPO (Recovery Point Objective) | RPO 0, no data loss on promotion | RPO > 0, possible data loss on promotion |
128108

129109
The replication mode can be changed after configuring Geo-Replication. You can go from synchronous to asynchronous or from asynchronous to synchronous. If you go from asynchronous to synchronous, your secondary will be configured as synchronous after lag reaches zero. If you're running with a continual lag for whatever reason, then you may need to pause your publishers in order for lag to reach zero and your mode to be able to switch to synchronous. The reasons to have synchronous replication enabled, instead of asynchronous replication, are tied to the importance of the data, specific business needs, or compliance reasons, rather than availability of your application.
130110

@@ -143,6 +123,57 @@ The Geo-Replication feature enables customers to configure a secondary region to
143123
- **Configure the replication consistency** - Synchronous and asynchronous replication is set when Geo-Replication is configured but can also be switched afterwards.
144124
- **Trigger promotion/failover** - All promotions are customer initiated.
145125
- **Remove a secondary** - If at any time you want to remove a secondary region, you can do so after which the data in the secondary region is deleted.
126+
127+
## Setup
128+
129+
### Using Azure portal
130+
131+
The following section is an overview to set up the Geo-Replication feature on a new namespace through the Azure portal.
132+
133+
1. Create a new premium-tier namespace, or create a new namespace on a dedicated cluster.
134+
1. Check the **Enable Geo-replication checkbox** under the *Replication* section.
135+
1. Click on the **Add secondary region** button, and choose a region.
136+
1. Either check the **Synchronous replication** checkbox, or specify a value for the **Async Replication - Max Replication lag** value in seconds.
137+
TBD :::image type="content" source="./media/service-bus-geo-replication/create-namespace-with-geo-replication.png" alt-text="Screenshot showing the Create Namespace experience with Geo-Replication enabled.":::
138+
139+
### Using Bicep template
140+
141+
To create a namespace with the Geo-Replication feature enabled, add the *geoDataReplication* properties section.
142+
143+
```bicep
144+
TBD
145+
```
146+
147+
## Management
148+
149+
Once you create a namespace with the Geo-Replication feature enabled, you can manage the feature from the **Geo-Replication** blade.
150+
151+
### Switch replication mode
152+
153+
To switch between replication modes, or update the maximum replication lag, click on the link under **Replication consistency**, and click the checkbox to enable / disable synchronous replication, or update the value in the textbox to change the asynchronous maximum replication lag.
154+
TBD :::image type="content" source="./media/service-bus-geo-replication/update-namespace-geo-replication-configuration.png" alt-text="Screenshot showing how to update the configuration of the Geo-Replication feature.":::
155+
156+
### Delete secondary region
157+
158+
To remove a secondary region, click on the **...**-ellipsis next to the region, and click **Delete**. To delete the region, follow the instructions in the pop-up blade.
159+
TBD :::image type="content" source="./media/service-bus-geo-replication/delete-secondary-region-from-geo-replication.png" alt-text="Screenshot showing how to delete a secondary region.":::
160+
161+
### Promotion flow
162+
163+
A promotion is triggered manually by the customer (either explicitly through a command, or through client owned business logic that triggers the command) and never by Azure. It gives the customer full ownership and visibility for outage resolution on Azure's backbone. When choosing **Planned** promotion, the service waits to catch up the replication lag before initiating the promotion. On the other hand, when choosing **Forced** promotion, the service immediately initiates the promotion. The namespace will be placed in read-only mode from the time that a promotion is requested, until the time that the promotion has completed. It is possible to do a forced promotion at any time after a planned promotion has been initiated. This puts the user in control to expedite the promotion, when a planned failover takes longer than desired.
164+
165+
| State | Diagram |
166+
| --- | ---|
167+
| Before failover (promotion of secondary) | :::image type="content" source="./media/geo-replication/a-as-primary.png" alt-text="Diagram showing when region A is primary, B is secondary."::: |
168+
| After failover (promotion of secondary) | :::image type="content" source="./media/geo-replication/b-as-primary.png" alt-text="Diagram showing when B is made the primary, that A becomes the new secondary."::: |
169+
170+
#### Using Azure portal
171+
172+
TBD
173+
174+
#### Using Azure CLI
175+
176+
TBD
146177

147178
## Monitoring data replication
148179
Users can monitor the progress of the replication job by monitoring the replication lag metric in Application Metrics logs.
@@ -159,7 +190,7 @@ Users can monitor the progress of the replication job by monitoring the replicat
159190
```
160191
- The column `count_d` indicates the replication lag in seconds between the primary and secondary region.
161192

162-
## Publishing Data
193+
## Publishing Data
163194
Event publishing applications can publish data to geo-replicated namespaces via stable namespace FQDN of the geo replicated namespace. The event publishing approach is the same as the non-Geo DR case and no changes to client applications are required.
164195

165196
Event publishing might not be available during the following circumstances:
@@ -174,14 +205,21 @@ Event consuming applications can consume data using the stable namespace FQDN of
174205
### Checkpointing/Offset Management
175206
Event consuming applications can continue to maintain offset management as they would do it with a single namespace.
176207

177-
**Kafka**
208+
#### Kafka
178209

179210
Offsets are committed to Event Hubs directly and offsets are replicated across regions. Therefore, consumers can start consuming from where it left off in the primary region.
180211

181-
**Event Hubs SDK/AMQP**
212+
#### Event Hubs SDK/AMQP
182213

183214
Clients that use the Event Hubs SDK need to upgrade to the April 2024 version of the SDK. The latest version of the Event Hubs SDK supports failover with an update to the checkpoint. The checkpoint is managed by users with a checkpoint store such as Azure Blob storage, or a custom storage solution. If there's a failover, the checkpoint store must be available from the secondary region so that clients can retrieve checkpoint data and avoid loss of messages.
184215

216+
## Considerations
217+
218+
Note the following considerations to keep in mind with this feature:
219+
220+
- In your promotion planning, you should also consider the time factor. For example, if you lose connectivity for longer than 15 to 20 minutes, you might decide to initiate the promotion.
221+
- Promoting a complex distributed infrastructure should be [rehearsed](/azure/architecture/reliability/disaster-recovery#disaster-recovery-plan) at least once.
222+
185223
## Pricing
186224
Event Hubs dedicated clusters are priced independently of geo-replication. Use of geo-replication with Event Hubs dedicated requires you to have at least two dedicated clusters in separate regions. The dedicated clusters used as secondary instances for geo-replication can be used for other workloads. There's a charge for geo-replication based on the published bandwidth * the number of secondary regions. The geo-replication charge is waived in early public preview.
187225

0 commit comments

Comments
 (0)