|
| 1 | +--- |
| 2 | +title: 'Insulate Azure Event Hubs applications against outages and disasters' |
| 3 | +description: 'This article provides techniques to protect applications during Azure Event Hubs planned maintenance or unplanned outage.' |
| 4 | +ms.topic: article |
| 5 | +author: axisc |
| 6 | +ms.author: aschhabria |
| 7 | +ms.date: 5/13/2025 |
| 8 | +ms.custom: references_regions |
| 9 | +--- |
| 10 | + |
| 11 | +# Best practices for insulating Azure Event Hubs applications against outages and disasters |
| 12 | + |
| 13 | +Mission-critical applications must operate continuously, even in the presence of planned maintenance or unplanned outages or disasters. Resilience against disastrous outages of data processing resources is a requirement for many enterprises and, in some cases, mandated by industry regulations. This article describes techniques you can use to protect Event Hubs applications during planned maintenance or against potential service outages or disasters. |
| 14 | + |
| 15 | +Azure Event Hubs already spreads the risk of catastrophic failures of individual machines or even complete racks across clusters that span multiple failure domains within a datacenter and it implements transparent failure detection and failover mechanisms such that the service continues to operate within the assured service-levels and typically without noticeable interruptions when such failures occur. |
| 16 | + |
| 17 | +Furthermore, the outage risk is further spread across three physically separated facilities (availability zones), and the service has enough capacity reserves to instantly cope with the complete, catastrophic loss of a datacenter. The all-active Azure Event Hubs cluster model within a failure domain along with the availability zone support is superior to any on-premises message broker product in terms of resiliency against grave hardware failures and even catastrophic loss of entire datacenter facilities. Still, there might be grave situations with widespread physical destruction that even those measures can't sufficiently defend against. |
| 18 | + |
| 19 | +The Event Hubs Geo-Disaster Recovery and Geo-Replication features are designed to make it easier to recover from a disaster of this magnitude and abandon a failed Azure region for good and without having to change your application configurations. |
| 20 | + |
| 21 | +## Definitions |
| 22 | + |
| 23 | +It’s important to distinguish between the different scenarios where business continuity and disaster recovery features may be used: |
| 24 | + |
| 25 | +- **Planned Maintenance :** A customer planned event where resources in the specific region are optimized to meet business goals. In these events, workflows may be adjusted to use a secondary region while the primary region is being optimized. For example, Blue-green deployments, database backups and recovery, data integrity checks. |
| 26 | + |
| 27 | +- **Outage:** A temporary unavailability of Event Hubs, which could affect individual partitions, the messaging store, or even the entire datacenter. Outages are typically resolved without data loss, and the service resumes normal operation once the underlying issue is fixed. Examples include hardware failures, software bugs, or short-term network issues. |
| 28 | + |
| 29 | +- **Disaster:** The permanent or prolonged loss of an Event Hubs cluster, region, or datacenter. The region or datacenter may or may not become available again, or might be down for hours or days. Examples of such disasters are fire, flooding, or earthquake. While this is unlikely, a disaster that becomes permanent might cause the loss of some messages, events, or other data. However, in most cases there should be no data loss and messages can be recovered once the data center comes back up. |
| 30 | + |
| 31 | + |
| 32 | +## Protection Against Outages and Disasters |
| 33 | + |
| 34 | +Azure Event Hubs offers several built-in mechanisms and recommended patterns for high availability and disaster recovery: |
| 35 | + |
| 36 | +### Availability Zones |
| 37 | + |
| 38 | +Event Hubs supports **availability zones** in select Azure regions. Data (metadata and event payloads) is replicated across physically separated datacenters within a region, providing fault isolation against datacenter-level failures. |
| 39 | + |
| 40 | +> [!NOTE] |
| 41 | +> Availability zones are enabled by default in supported regions. |
| 42 | +
|
| 43 | +### Geo-Disaster Recovery (Geo-DR) |
| 44 | + |
| 45 | +Event Hubs supports [Geo-Disaster Recovery (Geo-DR)](event-hubs-geo-dr.md) at the namespace level, which implements metadata disaster recovery between the primary and secondary namespace in different Azure regions. With Geo-disaster recovery, **only metadata** for entities is replicated between primary and secondary namespaces. |
| 46 | + |
| 47 | +### Geo-replication |
| 48 | + |
| 49 | +Geo-replication ensures that metadata and data of a namespace is continuously replicated from a primary region to the secondary region. The namespace can be thought of as being virtually extended to more than one region, with one region being the primary and the other being the secondary. |
| 50 | + |
| 51 | +At any time, the secondary region can be promoted to become a primary region. Promoting a secondary repoints the namespace FQDN to the selected secondary region, and the previous primary region is demoted to a secondary region. |
| 52 | + |
| 53 | +#### How does Geo-replication differ from Availability Zones |
| 54 | + |
| 55 | +Event Hubs offers [Availability Zones support](#availability-zones), depending on the Azure regions where the Event Hubs namespace is provisioned. Availability zones support offers fault isolation and provide resiliency **within** the same datacenter region. |
| 56 | + |
| 57 | +Geo-replication provides fault isolation **across** Azure regions, by pairing 2 regions together and ensuring the data is copied over for an RPO (recovery point objective). |
| 58 | + |
| 59 | +Availability Zones are **fully supported** along with geo-replication. |
| 60 | + |
| 61 | +#### How does Geo-replication differ from Geo-disaster recovery (DR) |
| 62 | + |
| 63 | +The [Geo-disaster recovery feature](#geo-disaster-recovery-geo-dr) replicates configuration information (or metadata) for a namespace from a primary namespace to a secondary namespace. It supports a one time only failover to the secondary region. During customer initiated failover, the alias name for the namespace is repointed to the secondary namespace and then the pairing is broken. No data is replicated other than configuration information nor are permission assignments replicated. |
| 64 | + |
| 65 | +Geo-replication feature replicates configuration information and all of the data from a primary namespace to the secondary region. Failover is performed by promoting the selected secondary to primary (and demoting the previous primary to a secondary). Users can fail back to the original primary when desired. |
| 66 | + |
| 67 | +Metadata disaster recovery (DR) is ***not supported*** along with geo-replication. You can migrate from *Metadata disaster recovery (DR)* to *Geo-replication*, by breaking the metadata DR pairing and enabling Geo-replication as mentioned in this document. |
| 68 | + |
| 69 | + |
| 70 | +## Next Steps |
| 71 | + |
| 72 | +To learn more about diaster recovery, see these articles: |
| 73 | + |
| 74 | + * [Event Hubs Geo-disaster recovery documentation](event-hubs-geo-dr.md) |
| 75 | + * [Event Hubs availability and consistency](event-hubs-availability-and-consistency.md) |
| 76 | + * [Event Hubs Geo-replication](geo-replication.md) |
0 commit comments