Skip to content

Commit a87b6c5

Browse files
authored
Merge pull request #296976 from duongau/erresiliency
ExpressRoute - Resiliency Insights and Validation (new article)
2 parents 235056c + a797773 commit a87b6c5

File tree

9 files changed

+268
-0
lines changed

9 files changed

+268
-0
lines changed

articles/expressroute/TOC.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,10 @@
221221
href: use-s2s-vpn-as-backup-for-expressroute-privatepeering.md
222222
- name: Evaluate ExpressRoute circuit resiliency
223223
href: evaluate-circuit-resiliency.md
224+
- name: Resiliency Insights
225+
href: resiliency-insights.md
226+
- name: Resiliency Validation
227+
href: resiliency-validation.md
224228
- name: Security
225229
items:
226230
- name: Security baseline
104 KB
Loading
26.1 KB
Loading
76.1 KB
Loading
27 KB
Loading
173 KB
Loading
80.8 KB
Loading
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
---
2+
title: Resiliency Insights for ExpressRoute virtual network gateway (preview)
3+
description: Learn about the resiliency features of ExpressRoute gateway and how they can help you maintain connectivity to your on-premises network.
4+
services: expressroute
5+
author: duongau
6+
ms.service: azure-expressroute
7+
ms.topic: conceptual
8+
ms.date: 03/31/2025
9+
ms.author: duau
10+
ms.custom: ai-usage
11+
---
12+
13+
# Resiliency Insights for ExpressRoute virtual network gateway (preview)
14+
15+
Resiliency Insights is an assessment capability designed to measure your network's reliability for ExpressRoute workloads. At the core of this capability is the resiliency index, a percentage score calculated based on factors such as route resilience, zone-redundant gateway usage, advisory recommendations, and resiliency validation tests. This index evaluates the control plane resiliency of the ExpressRoute connectivity between your ExpressRoute virtual network gateway and on-premises network. By analyzing and improving this index, you can enhance the robustness and reliability of your connectivity to Azure workloads through ExpressRoute.
16+
17+
> [!NOTE]
18+
> To participate in the preview, contact the [**Azure ExpressRoute team**](mailto:[email protected]).
19+
20+
:::image type="content" source="media/resiliency-insights/resiliency-insights.png" alt-text="Screenshot of the Resiliency Insights feature, accessible under the monitoring section in the left-hand menu of the ExpressRoute gateway resource.":::
21+
22+
> [!IMPORTANT]
23+
> **Azure ExpressRoute Resiliency Insights** is currently in PREVIEW.
24+
> Refer to the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) for the legal terms applicable to Azure features in beta, preview, or other prerelease stages.
25+
26+
## Route set
27+
28+
The Resiliency Insights topology provides a detailed view of your route sets, helping you determine if they're site-resilient. It also highlights issues such as Private peering BGP (Border Gateway Protocol) session failures at any peering location or uneven route advertisement across ExpressRoute connections. By expanding the route sets, you can trace how routes propagate through each circuit and connection at the peering location.
29+
30+
:::image type="content" source="media/resiliency-insights/route-set.png" alt-text="Screenshot of the route set section in the Resiliency Insights feature, showing the routes associated with ExpressRoute circuit.":::
31+
32+
A route set represents a group of routes advertised from your on-premises network to the ExpressRoute virtual network gateway. These routes are shared across one or more connections within a common set of ExpressRoute circuits. By analyzing the route sets associated with your gateway, you can evaluate the resiliency of your ExpressRoute connections and identify potential areas for improvement.
33+
34+
## Resiliency index
35+
36+
The resiliency index score is a metric designed to assess the reliability of your ExpressRoute connection. It's calculated based on four key factors that contribute to the resiliency of your ExpressRoute virtual network gateway: route resiliency, resiliency validation tests, gateway zone redundancy, and advisory recommendations. Each factor is evaluated to produce an overall resiliency index score, which ranges from 0 to 100. A higher score reflects a more resilient ExpressRoute connection, while a lower score highlights potential areas for improvement that could affect the reliability of your connection.
37+
38+
| Scoring Criteria | Weight |
39+
|-------------------|--------|
40+
| [Route resiliency](#route) (Advertising routes through multi-site ExpressRoute circuits) | 20% |
41+
| [Zone redundant virtual network gateway](#redundancy) | 10% |
42+
| [Resiliency recommendation](#recommendation) | 10% |
43+
| [Resiliency validation readiness test score](#readiness) | Route score multiplier |
44+
45+
### <a name="route"></a> Route resiliency score
46+
47+
Route resiliency is a key factor in assessing the reliability of your ExpressRoute connection. Advertising routes through multiple ExpressRoute circuits at different peering locations creates redundant paths for your traffic. This redundancy minimizes the effect of circuit failures or maintenance events at a single site, ensuring uninterrupted access to your Azure resources.
48+
49+
- Advertising routes through two distinct peering locations: **20%**.
50+
- Advertising routes through ExpressRoute Metro: **10%**.
51+
- Advertising routes through a single peering location: **5%**.
52+
53+
The route resiliency score is **zero** in both high-resiliency (ExpresRoute Metro) and standard-resiliency configurations if there's a link failure between the Microsoft Enterprise Edge (MSEE) and the provider edge (PE) router.
54+
55+
### <a name="redundancy"></a> Zone redundant virtual network gateway score
56+
57+
The zone redundancy feature enhances the reliability of the virtual network gateway by deploying it across multiple failure zones. This configuration ensures higher resiliency for your ExpressRoute connection, maintaining connectivity between your on-premises network and Azure resources.
58+
59+
- **Standard** and **High-Performance** SKUs: **0%**.
60+
- **Ultra Performance** SKUs: **2%**.
61+
- **ErGW1Az, ErGW2Az, ErGW3Az** SKUs:
62+
- Zonal deployment: **8%**.
63+
- Zone-redundant deployment: **10%**.
64+
- **ErGWScale** SKU:
65+
- Up to four instances (two scale units): **8%**.
66+
- More than four instances: **10%**.
67+
68+
### <a name="recommendation"></a> Resiliency recommendation score
69+
70+
Advisor recommendations provide actionable insights to improve the reliability of your ExpressRoute connection. Implementing these recommendations can enhance the resiliency of your connection and ensure uninterrupted access to Azure resources.
71+
72+
If no advisory recommendations are provided, the resiliency score for this category is **10%**.
73+
74+
> [!NOTE]
75+
> Recommendations to deploy a zone redundant gateway or a multi-site ExpressRoute circuit are already factored into the overall resiliency index score. As a result, they don't affect the advisory recommendations score directly.
76+
77+
### <a name = "readiness"></a> Resiliency validation readiness test score
78+
79+
ExpressRoute maximum resiliency circuits are defined as a pair of two standard circuits configured in two different peering locations. Any extra circuits would further enhance the resiliency, but these circuits aren't scored. For the resiliency validation multiplier to take effect, you must run the [Resiliency Validation](resiliency-validation.md) test on both peering locations.
80+
81+
The following multipliers are applied to the route resiliency score based on the results of the Resiliency Validation test:
82+
83+
- Resiliency tests conducted within the last 30 days: multiplier of **4**.
84+
- Tests conducted 31–60 days ago: multiplier of **3**.
85+
- Tests conducted 61–90 days ago: multiplier of **2**.
86+
- Tests conducted over 90 days ago: multiplier of **1**.
87+
88+
> [!IMPORTANT]
89+
> If resiliency validation is completed for only one of the two peering locations, the multiplier applied to the route resiliency score is reduced by **half**.
90+
91+
The resiliency index score provides a comprehensive assessment of the reliability of your ExpressRoute connection. By understanding the key factors that influence this score, you can identify opportunities to enhance the resiliency of your connection. Implementing the recommendations and best practices outlined in this article help you strengthen your ExpressRoute setup, ensuring consistent and reliable connectivity between your on-premises network and Azure resources.
92+
93+
## Frequently asked questions
94+
95+
1. Why can't I see the Resiliency Insights feature in my ExpressRoute virtual network gateway?
96+
97+
- The Resiliency Insights feature is currently in preview. To gain access, contact the [Azure ExpressRoute team](mailto:[email protected]) for onboarding.
98+
- This feature isn't supported for Virtual WAN ExpressRoute gateways.
99+
- You must have Contributor-level authorization to access this feature.
100+
101+
1. Why doesn't the pane refresh immediately after I select **Refresh**?
102+
103+
The pane refreshes automatically every hour. If the last update occurred less than an hour ago, the pane won't refresh until the next polling interval is reached.
104+
105+
1. Does the feature support Microsoft Peering or VPN connectivity?
106+
107+
No, the Resiliency Insights feature supports only ExpressRoute Private Peering connectivity. It doesn't support Microsoft Peering or VPN connectivity.
108+
109+
## Next steps
110+
111+
- Learn more about [ExpressRoute virtual network gateway](expressroute-about-virtual-network-gateways.md).
112+
- Learn about [Zone redundancy for ExpressRoute virtual network gateway](../vpn-gateway/about-zone-redundant-vnet-gateways.md?toc=%2Fazure%2Fexpressroute%2Ftoc.json).
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
---
2+
title: Azure ExpressRoute Gateway Resiliency Validation (preview)
3+
description: This article helps you understand the Azure ExpressRoute Gateway Resiliency Validation feature and how to use it.
4+
services: expressroute
5+
author: duongau
6+
ms.service: azure-expressroute
7+
ms.topic: conceptual
8+
ms.date: 03/31/2025
9+
ms.author: duau
10+
ms.custom: ai-usage
11+
---
12+
13+
# Azure ExpressRoute Gateway Resiliency Validation (preview)
14+
15+
Resiliency validation is a capability designed to assess the resiliency of network connectivity for ExpressRoute-enabled workloads. This feature allows you to perform site failovers for your virtual network gateway, helping to evaluate network resiliency during site outages and validate setup during migrations by testing the effectiveness of failover mechanisms. By proactively testing your network, you can ensure continuous connectivity to Azure workloads and ensure the robustness of your connections.
16+
17+
> [!IMPORTANT]
18+
> **Azure ExpressRoute Resiliency Validation** is currently in PREVIEW.
19+
> See the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.
20+
21+
## Key features
22+
23+
- **Simulate circuit failover** - Connections are disconnected temporarily from the gateway of interest to the selected ExpressRoute circuit to simulate a failover from one peering location to another.
24+
- **Route redundancy** - Insights into duplicate routes are provided for all prefixes received from the selected peering location.
25+
- **Traffic visualization** - Visualize traffic on the ExpressRoute gateway and all connections associated to it during testing.
26+
- **Test history** - Detailed information of previously conducted tests.
27+
28+
### Common use cases
29+
30+
- Facilitate in identifying and solving potential problems within your network to enhance the overall reliability and resiliency of your network infrastructure.
31+
32+
- Essential for high availability and disaster recovery (HA/DR) procedures and migration validation. It ensures your systems are prepared for unplanned events and maintains seamless operations by validating maintenance behavior at the workload level.
33+
34+
- Serves as a prerequisite for migrating from one ExpressRoute peering location to another, ensuring network resiliency before implementing major changes.
35+
36+
### Limitations
37+
38+
- The Resiliency Validation feature is available only for ExpressRoute gateways connected to ExpressRoute circuits in at least two distinct peering locations.
39+
- The **Route List** tab can only be refreshed once per hour.
40+
- This feature isn't supported for Virtual WAN or ExpressRoute Metro.
41+
- You can't run the Resiliency Validation test if there are any ongoing tests or if any of the circuits are currently undergoing maintenance.
42+
43+
## Prerequisites
44+
45+
- To participate in the preview, contact the [**Azure ExpressRoute**](mailto:[email protected]) team.
46+
- Ensure that you have an ExpressRoute circuit in at least two distinct peering locations and an ExpressRoute virtual network gateway connected to those circuits.
47+
48+
## Using the gateway resiliency validation
49+
50+
The gateway resiliency validation can be accessed from any ExpressRoute gateway resource by navigating to the **Monitoring** section in the left-hand menu.
51+
52+
:::image type="content" source="media/resiliency-validation/resiliency-validation.png" alt-text="Screenshot of the Resiliency Validation feature, accessible under the monitoring section in the left-hand menu of the ExpressRoute gateway resource.":::
53+
54+
The dashboard provides a detailed overview of all ExpressRoute circuits connected to the ExpressRoute virtual network gateway, categorized by peering location. It displays the most recent test status, the timestamp of the last test conducted, the results of the latest test, and an action button to initiate a new test.
55+
56+
> [!IMPORTANT]
57+
> - During the test, the ExpressRoute virtual network gateway disconnect from the target ExpressRoute circuit, causing a temporary loss of connectivity for nonredundant routes. Ensure your routing policies are configured to support traffic failover.
58+
> - The targeted ExpressRoute circuit maintains connectivity to other ExpressRoute virtual network gateways, and the gateway doing the test maintains connectivity to other ExpressRoute circuits.
59+
60+
### Starting the test
61+
62+
1. Navigate to the desired peering location and select the **Start new test** button.
63+
64+
1. Review the autopopulated configuration, which includes:
65+
66+
- Gateway name
67+
- Peering location
68+
- Route redundancy information
69+
- Traffic details
70+
- Status of all connections to the ExpressRoute gateway
71+
72+
1. Ensure that all critical routes are marked as redundant by reviewing the **Route List** tab.
73+
74+
:::image type="content" source="media/resiliency-validation/route-list.png" alt-text="Screenshot showing the Route List tab with details of redundant and nonredundant routes.":::
75+
76+
1. Confirm that the circuits listed on this page aren't undergoing maintenance by selecting the first checkbox.
77+
78+
1. Acknowledge that you reviewed the **Route List** tab and that all critical routes are marked as redundant by selecting the second checkbox.
79+
80+
1. Enter the name of the gateway to confirm that you're aware of the potential effect of the test on your network.
81+
82+
1. Select **Start Simulation** to initiate the test.
83+
84+
:::image type="content" source="media/resiliency-validation/start-test.png" alt-text="Screenshot showing the Resiliency Validation testing page.":::
85+
86+
1. The resiliency validation status shows as **In progress**.
87+
88+
### During the test
89+
90+
1. Navigate to the **Test Status** tab to validate connectivity to your Azure workloads through each redundant connection. Review the traffic flow graph for the ExpressRoute gateway, which displays the average bits per second traffic flow. The tab also provides ingress and egress traffic information for connected and disconnected peering locations.
91+
92+
:::image type="content" source="media/resiliency-validation/test-status.png" alt-text="Screenshot showing the traffic flow graph for an ExpressRoute gateway and traffic data for connections to the gateway.":::
93+
94+
> [!NOTE]
95+
> Traffic metrics are updated every minute and displayed in the **Test Status** tab. Allow up to 5 minutes for the metrics to appear after initiating the test.
96+
97+
1. Validate connectivity from your on-premises network to your Azure workloads through the redundant connection by sending data packets. Tools like [iPerf](https://iperf.fr/) can be used for this purpose.
98+
99+
1. Select the **Stop Simulation** button to end the test. Confirm if the test was completed successfully when prompted and select the failover peering location.
100+
101+
1. Once confirmed, connectivity for all connections to the ExpressRoute gateway gets restored.
102+
103+
1. You can view the test report by selecting **View** under the *Test History* column on the dashboard for the selected peering location.
104+
105+
## Frequently asked questions
106+
107+
1. Why can't I see the Resiliency Insights feature in my ExpressRoute virtual network gateway?
108+
109+
- The Resiliency Insights feature is currently in preview. To gain access, contact the [Azure ExpressRoute team](mailto:[email protected]) for onboarding.
110+
- This feature is only available for ExpressRoute virtual network gateways configured in a Max Resiliency model. It isn't supported for Virtual WAN ExpressRoute gateways.
111+
- You must have Contributor-level authorization to access this feature.
112+
113+
1. Why is the Route List not updated to the latest?
114+
115+
The Route List tab has a polling interval of 1 hour. This means the pane won't refresh for 1 hour from the last updated time.
116+
117+
1. Does the feature support Microsoft Peering or VPN connectivity?
118+
119+
No, the Resiliency Insights feature supports only ExpressRoute Private Peering connectivity. It doesn't support Microsoft Peering or VPN connectivity.
120+
121+
1. Can control the gateway validation tests other than the Azure portal?
122+
123+
Yes, you can use REST API to start and stop the Gateway resiliency validation tests.
124+
125+
1. What happens if I don't terminate a test?
126+
127+
The test continues to run indefinitely.
128+
129+
1. What metrics or alerts can I monitor during the resiliency validation test?
130+
131+
To ensure network resilience during outages, redundant connections should be configured. During a failover, if the backup circuit exceeds 100% of its bandwidth, packet drops might occur. Use [Circuit QoS](monitor-expressroute-reference.md#category-circuit-qos) metrics to monitor packet drops caused by rate limiting. Additionally, the **Test Status** tab in the Resiliency Validation feature provides traffic monitoring for the connections. Ensure alerts are configured to validate their effectiveness during the test.
132+
133+
1. Can I control traffic on demand using the gateway resiliency validation tool?
134+
135+
Yes, if the routes are advertised redundantly through circuits in different peering locations, the gateway resiliency validation tool allows you to control traffic on demand by failing traffic over to connections in an alternative site.
136+
137+
1. Does this feature support FastPath and Private Link?
138+
139+
For FastPath, while the data path bypasses the gateway, the gateway still handles control plane activities such as route management. During a disconnect between the ExpressRoute circuit and the gateway, routes are withdrawn from the affected circuit. However, if redundant circuits are properly configured, connectivity for failover connections to FastPath and Private Link is maintained during the failover.
140+
141+
1. Is packet loss expected during a failover simulation?
142+
143+
A brief connectivity disruption occurs during the failover simulation as BGP (Border Gateway Protocol) reconverges. Performance tests using iPerf on TCP (up to 500 Mbps) show no packet loss during the simulation. However, in an actual outage scenario, some packet loss can occur until traffic successfully fails over.
144+
145+
1. How long does a failover take?
146+
147+
Once the simulation begins, traffic failover typically completes within 15 seconds.
148+
149+
## Next steps
150+
151+
- Learn more about the [ExpressRoute gateway](expressroute-about-virtual-network-gateways.md) and how to [monitor ExpressRoute circuits](monitor-expressroute.md).
152+
- Learn about [ExpressRoute Resiliency Insights](resiliency-insights.md).

0 commit comments

Comments
 (0)