Skip to content

Commit 8ed9aef

Browse files
committed
freshness review
1 parent cbca77c commit 8ed9aef

File tree

1 file changed

+29
-31
lines changed

1 file changed

+29
-31
lines changed

articles/expressroute/designing-for-high-availability-with-expressroute.md

Lines changed: 29 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -5,89 +5,87 @@ services: expressroute
55
author: duongau
66
ms.service: azure-expressroute
77
ms.topic: concept-article
8-
ms.date: 06/15/2023
8+
ms.date: 11/18/2024
99
ms.author: duau
1010
---
1111

12-
# Designing for high availability with ExpressRoute
12+
# Designing for high availability with Azure ExpressRoute
1313

14-
ExpressRoute is designed for high availability to provide carrier grade private network connectivity to Microsoft resources. In other words, there's no single point of failure in the ExpressRoute path within Microsoft network. To maximize the availability, the customer and the service provider segment of your ExpressRoute circuit should also be architected for high availability. In this article, first let's look into network architecture considerations for building robust network connectivity using an ExpressRoute, then let's look into the fine-tuning features that help you to improve the high availability of your ExpressRoute circuit.
14+
Azure ExpressRoute is designed for high availability, providing carrier-grade private network connectivity to Microsoft resources. This means there is no single point of failure within the Microsoft network. To maximize availability, both the customer and service provider segments of your Azure ExpressRoute circuit should also be architected for high availability. This article covers network architecture considerations for building robust connectivity using Azure ExpressRoute and fine-tuning features to improve the high availability of your Azure ExpressRoute circuit.
1515

1616
> [!NOTE]
17-
> The concepts described in this article equally applies when an ExpressRoute circuit is created under Virtual WAN or outside of it.
18-
>
17+
> The concepts described in this article apply equally whether an Azure ExpressRoute circuit is created under Virtual WAN or outside of it.
1918
2019
## Architecture considerations
2120

22-
The following figure illustrates the recommended way to connect using an ExpressRoute circuit for maximizing the availability of an ExpressRoute circuit.
21+
The following figure illustrates the recommended way to connect using an Azure ExpressRoute circuit to maximize availability.
2322

24-
[![1]][1]
23+
[![1]][1]
2524

26-
For high availability, it's essential to maintain the redundancy of the ExpressRoute circuit throughout the end-to-end network. In other words, you need to maintain redundancy within your on-premises network, and shouldn't compromise redundancy within your service provider network. Maintaining redundancy at the minimum implies avoiding single point of network failures. Having redundant power and cooling for the network devices further improves the high availability.
25+
For high availability, it's essential to maintain redundancy throughout the end-to-end network. This means maintaining redundancy within your on-premises network and not compromising redundancy within your service provider network. At a minimum, this involves avoiding single points of network failure. Redundant power and cooling for network devices further improve high availability.
2726

2827
### First mile physical layer design considerations
2928

30-
If you terminate both the primary and secondary connections of an ExpressRoute circuits on the same Customer Premises Equipment (CPE), you're compromising the high availability within your on-premises network. Additionally, if you configure both the primary and secondary connections using the same port of a CPE, you're forcing the partner to compromise high availability on their network segment as well. This event can happen by either terminating the two connections under different subinterfaces or by merging the two connections within the partner network. This compromise is illustrated in the following figure.
29+
If you terminate both the primary and secondary connections of an Azure ExpressRoute circuit on the same Customer Premises Equipment (CPE), you compromise high availability within your on-premises network. Additionally, configuring both connections using the same port of a CPE forces the partner to compromise high availability on their network segment. This can occur by terminating the two connections under different subinterfaces or merging the two connections within the partner network, as illustrated below.
3130

3231
[![2]][2]
3332

34-
On the other hand, if you terminate the primary and the secondary connections of an ExpressRoute circuits in different geographical locations, then you could be compromising the network performance of the connectivity. If traffic is actively load balanced across the primary and the secondary connections that are terminated on different geographical locations, potential substantial difference in network latency between the two paths would result in suboptimal network performance.
33+
Terminating the primary and secondary connections of an Azure ExpressRoute circuit in different geographical locations can compromise network performance. If traffic is actively load-balanced across connections terminated in different locations, substantial differences in network latency between the two paths can result in suboptimal performance.
3534

36-
For geo-redundant design considerations, see [Designing for disaster recovery with ExpressRoute][DR].
35+
For geo-redundant design considerations, see [Designing for disaster recovery with Azure ExpressRoute][DR].
3736

3837
### Active-active connections
3938

40-
Microsoft network is configured to operate the primary and secondary connections of ExpressRoute circuits in active-active mode. However, through your route advertisements, you can force the redundant connections of an ExpressRoute circuit to operate in active-passive mode. Advertising more specific routes and BGP AS path prepending are the common techniques used to make one path prefer over the other.
39+
Microsoft network operates the primary and secondary connections of Azure ExpressRoute circuits in active-active mode. However, you can force the redundant connections to operate in active-passive mode through your route advertisements. Advertising more specific routes and BGP AS path prepending are common techniques to prefer one path over the other.
4140

42-
To improve high availability, it's recommended to operate both the connections of an ExpressRoute circuit in active-active mode. If you let the connections operate in active-active mode, Microsoft network loads balance the traffic across the connections on per-flow basis.
41+
To improve high availability, it's recommended to operate both connections in active-active mode. This allows Microsoft network to load balance traffic across the connections on a per-flow basis.
4342

44-
Running the primary and secondary connections of an ExpressRoute circuit in active-passive mode face the risk of both the connections failing following a failure in the active path. The common causes for failure on switching over are lack of active management of the passive connection, and passive connection advertising stale routes.
43+
Running connections in active-passive mode risks both connections failing if the active path fails. Common causes for failure include lack of active management of the passive connection and passive connection advertising stale routes.
4544

46-
Alternatively, running the primary and secondary connections of an ExpressRoute circuit in active-active mode, results in only about half the flows failing and getting rerouted. Therefore, an active-active connection significantly helps improve the Mean Time To Recover (MTTR).
45+
Alternatively, running connections in active-active mode results in only about half the flows failing and getting rerouted, significantly improving the Mean Time To Recover (MTTR).
4746

4847
> [!NOTE]
49-
> During a maintenance activity or in case of unplanned events impacting one of the connection, Microsoft will prefer to use AS path prepending to drain traffic over to the healthy connection. You will need to ensure the traffic is able to route over the healthy path when path prepend is configure from Microsoft and required route advertisements are configured appropriately to avoid any service disruption.
50-
>
48+
> During maintenance or unplanned events impacting one connection, Microsoft will use AS path prepending to drain traffic to the healthy connection. Ensure traffic can route over the healthy path when path prepending is configured by Microsoft and required route advertisements are set appropriately to avoid service disruption.
5149
52-
### NAT for Microsoft peering
50+
### NAT for Microsoft peering
5351

54-
Microsoft peering is designed for communication between public end-points. So commonly, on-premises private endpoints are Network Address Translated (NATed) with public IP on the customer or partner network before they communicate over Microsoft peering. Assuming you use both the primary and secondary connections in an active-active setup. Where and how your NAT has an effect on how quickly you recover following a failure in one of the ExpressRoute connections. Two different NAT options are illustrated in the following figure:
52+
Microsoft peering is designed for communication between public endpoints. Typically, on-premises private endpoints are Network Address Translated (NATed) with public IPs on the customer or partner network before communicating over Microsoft peering. Using both primary and secondary connections in an active-active setup affects how quickly you recover from a failure in one of the connections. Two different NAT options are illustrated below:
5553

5654
[![3]][3]
5755

5856
#### Option 1:
5957

60-
NAT gets applied after splitting the traffic between the primary and secondary connections of the ExpressRoute circuit. To meet the stateful requirements of NAT, independent NAT pools are used for the primary and the secondary devices. The return traffic arrives on the same edge device through which the flow egressed.
58+
NAT is applied after splitting traffic between the primary and secondary connections. Independent NAT pools are used for the primary and secondary devices to meet stateful NAT requirements. Return traffic arrives on the same edge device through which the flow egressed.
6159

62-
If the ExpressRoute connection fails, the ability to reach the corresponding NAT pool is then broken. Therefore, all broken network flows have to get re-established either by TCP or by the application layer following the corresponding window timeout. During the failure, Azure can't reach the on-premises servers using the corresponding NAT until connectivity has been restored for either the primary or secondary connections of the ExpressRoute circuit.
60+
If an Azure ExpressRoute connection fails, the corresponding NAT pool becomes unreachable, breaking all network flows. These flows must be re-established by TCP or the application layer following the window timeout. During the failure, Azure cannot reach on-premises servers using the corresponding NAT until connectivity is restored.
6361

6462
#### Option 2:
6563

66-
A common NAT pool is used before splitting the traffic between the primary and secondary connections of the ExpressRoute circuit. It's important to make the distinction that the common NAT pool before splitting the traffic doesn't mean it introduces a single-point of failure as such compromising high-availability.
64+
A common NAT pool is used before splitting traffic between the primary and secondary connections. This does not introduce a single point of failure, thus maintaining high availability.
6765

68-
The NAT pool is reachable even after the primary or secondary connection fail. So the network layer itself can reroute the packets and help recover faster following a failure.
66+
The NAT pool remains reachable even if the primary or secondary connection fails, allowing the network layer to reroute packets and recover faster.
6967

7068
> [!NOTE]
71-
> * If you use NAT option 1 (independent NAT pools for primary and secondary ExpressRoute connections) and map a port of an IP address from one of the NAT pool to an on-premises server, the server will not be reachable via the ExpressRoute circuit when the corresponding connection fails.
72-
> * Terminating ExpressRoute BGP connections on stateful devices can cause issues with failover during planned or unplanned maintenances by Microsoft or your ExpressRoute Provider. You should test your set up to ensure your traffic will failover properly, and when possible, terminate BGP sessions on stateless devices.
69+
> * If using NAT option 1 (independent NAT pools for primary and secondary connections) and mapping a port of an IP address from one NAT pool to an on-premises server, the server will not be reachable via the Azure ExpressRoute circuit if the corresponding connection fails.
70+
> * Terminating Azure ExpressRoute BGP connections on stateful devices can cause failover issues during planned or unplanned maintenance by Microsoft or your Azure ExpressRoute Provider. Test your setup to ensure proper failover, and when possible, terminate BGP sessions on stateless devices.
7371
7472
## Fine-tuning features for private peering
7573

76-
In this section, let us review optional (depending on your Azure deployment and how sensitive you're to MTTR) features that help improve high availability of your ExpressRoute circuit. Specifically, let's review zone-aware deployment of ExpressRoute virtual network gateways, and Bidirectional Forwarding Detection (BFD).
74+
This section reviews optional features that help improve the high availability of your Azure ExpressRoute circuit, depending on your Azure deployment and sensitivity to MTTR. Specifically, it covers zone-aware deployment of Azure ExpressRoute virtual network gateways and Bidirectional Forwarding Detection (BFD).
7775

78-
### Availability Zone aware ExpressRoute virtual network gateways
76+
### Availability Zone aware Azure ExpressRoute virtual network gateways
7977

80-
An Availability Zone in an Azure region is a combination of a fault domain and an update domain. To achieve the highest resiliency and availability, you should configure a zone-redundant ExpressRoute virtual network gateway. To learn more, see [About zone-redundant virtual network gateways in Azure Availability Zones][zone redundant vgw]. To configure a zone-redundant virtual network gateway, see [Create a zone-redundant virtual network gateway in Azure Availability Zones][conf zone redundant vgw].
78+
An Availability Zone in an Azure region combines a fault domain and an update domain. To achieve the highest resiliency and availability, configure a zone-redundant Azure ExpressRoute virtual network gateway. For more information, see [About zone-redundant virtual network gateways in Azure Availability Zones][zone redundant vgw]. To configure a zone-redundant virtual network gateway, see [Create a zone-redundant virtual network gateway in Azure Availability Zones][conf zone redundant vgw].
8179

8280
### Improving failure detection time
8381

84-
ExpressRoute supports BFD over private peering. BFD reduces detection time of failure over the Layer 2 network between Microsoft Enterprise Edge (MSEEs) and their BGP neighbors on the on-premises side from about 3 minutes (default) to less than a second. Quick failure detection time helps hastening failure recovery. To learn further, see [Configure BFD over ExpressRoute][BFD].
82+
Azure ExpressRoute supports BFD over private peering, reducing failure detection time over the Layer 2 network between Microsoft Enterprise Edge (MSEEs) and their BGP neighbors on the on-premises side from about 3 minutes (default) to less than a second. Quick failure detection helps hasten recovery. For more information, see [Configure BFD over Azure ExpressRoute][BFD].
8583

8684
## Next steps
8785

88-
In this article, we discussed how to design for high availability of an ExpressRoute circuit connectivity. An ExpressRoute circuit peering point is pinned to a geographical location and therefore get affected by catastrophic failure that affects the entire location.
86+
This article discussed designing for high availability of an Azure ExpressRoute circuit. An Azure ExpressRoute circuit peering point is pinned to a geographical location and can be affected by catastrophic failures impacting the entire location.
8987

90-
For design considerations to build geo-redundant network connectivity to Microsoft backbone that can withstand catastrophic failures, which affect an entire region, see [Designing for disaster recovery with ExpressRoute private peering][DR].
88+
For design considerations to build geo-redundant network connectivity to the Microsoft backbone that can withstand catastrophic failures affecting an entire region, see [Designing for disaster recovery with Azure ExpressRoute private peering][DR].
9189

9290
<!--Image References-->
9391
[1]: ./media/designing-for-high-availability-with-expressroute/exr-reco.png "Recommended way to connect using ExpressRoute"

0 commit comments

Comments
 (0)