Skip to content

Commit d168b36

Browse files
committed
create resiliency insights and validation
1 parent 10a39a9 commit d168b36

File tree

8 files changed

+245
-0
lines changed

8 files changed

+245
-0
lines changed

articles/expressroute/TOC.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,10 @@
221221
href: use-s2s-vpn-as-backup-for-expressroute-privatepeering.md
222222
- name: Evaluate ExpressRoute circuit resiliency
223223
href: evaluate-circuit-resiliency.md
224+
- name: Resiliency Insights
225+
href: resiliency-insights.md
226+
- name: Resiliency Validation
227+
href: resiliency-validation.md
224228
- name: Security
225229
items:
226230
- name: Security baseline
104 KB
Loading
76.1 KB
Loading
27 KB
Loading
173 KB
Loading
80.2 KB
Loading
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
---
2+
title: Resiliency Insight for ExpressRoute virtual network gateway (preview)
3+
description: Learn about the resiliency features of ExpressRoute gateway and how they can help you maintain connectivity to your on-premises network.
4+
services: expressroute
5+
author: duongau
6+
ms.service: azure-expressroute
7+
ms.topic: conceptual
8+
ms.date: 03/24/2025
9+
ms.author: duau
10+
ms.custom: ai-usage
11+
---
12+
13+
# Resiliency Insight for ExpressRoute virtual network gateway (preview)
14+
15+
ExpressRoute enables private connections between your on-premises networks and Azure workloads. These connections are established through a virtual network gateway, which acts as the entry point to Microsoft's network. The virtual network gateway plays a critical role in the ExpressRoute architecture by providing routing and forwarding capabilities that ensure secure and reliable connectivity between your on-premises network and Azure.
16+
17+
In this article, we explore the resiliency insight feature of the ExpressRoute virtual network gateway and explain how the resiliency index score can help you evaluate the reliability of your ExpressRoute connectivity.
18+
19+
:::image type="content" source="media/resiliency-insights/resiliency-insights.png" alt-text="Screenshot of the Resiliency Insights feature, accessible under the monitoring section in the left-hand menu of the ExpressRoute gateway resource.":::
20+
21+
> [!IMPORTANT]
22+
> **Azure ExpressRoute Resiliency Insight** is currently in PREVIEW.
23+
> Refer to the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) for the legal terms applicable to Azure features in beta, preview, or other prerelease stages.
24+
25+
## Resiliency index
26+
27+
The resiliency index score is a metric designed to assess the reliability of your ExpressRoute connection. It's calculated based on four key factors that contribute to the resiliency of your ExpressRoute virtual network gateway: route resiliency, resiliency validation tests, gateway zone redundancy, and advisory recommendations. Each factor is evaluated to produce an overall resiliency index score, which ranges from 0 to 100. A higher score reflects a more resilient ExpressRoute connection, while a lower score highlights potential areas for improvement that could affect the reliability of your connection.
28+
29+
| Scoring Criteria | Weight |
30+
|-------------------|--------|
31+
| [Route resiliency](#route) (Advertising routes through multi-site ExpressRoute circuits) | 20% |
32+
| [Zone redundant virtual network gateway](#redundancy) | 10% |
33+
| [Resiliency recommendation](#recommendation) | 10% |
34+
| [Resiliency validation readiness test score](#readiness) | Route score multiplier |
35+
36+
## Route set
37+
38+
A route set represents a group of routes advertised from your on-premises network to the ExpressRoute virtual network gateway. These routes are shared across one or more connections within a common set of ExpressRoute circuits. By analyzing the route sets associated with your gateway, you can evaluate the resiliency of your ExpressRoute connections and identify potential areas for improvement.
39+
40+
The Resiliency Insights topology provides a detailed view of your route sets, helping you determine if they're site-resilient. It also highlights issues such as Private peering BGP (Border Gateway Protocol) session failures at any peering location or uneven route advertisement across ExpressRoute connections. By expanding the route sets, you can trace how routes propagate through each circuit and connection at the peering location.
41+
42+
## Understanding the resiliency index scores
43+
44+
### <a name="route"></a> Route resiliency score
45+
46+
Route resiliency is a key factor in assessing the reliability of your ExpressRoute connection. Advertising routes through multiple ExpressRoute circuits at different peering locations creates redundant paths for your traffic. This redundancy minimizes the effect of circuit failures or maintenance events at a single site, ensuring uninterrupted access to your Azure resources.
47+
48+
- Advertising routes through two distinct peering locations: **20%**.
49+
- Advertising routes through ExpressRoute Metro: **10%**.
50+
- Advertising routes through a single peering location: **5%**.
51+
52+
The route resiliency score is **zero** in both high-resiliency (ExpresRoute Metro) and standard-resiliency configurations if there's a link failure between the Microsoft Enterprise Edge (MSEE) and the provider edge (PE) router.
53+
54+
### <a name="redundancy"></a> Zone redundant virtual network gateway score
55+
56+
The zone redundancy feature enhances the reliability of the virtual network gateway by deploying it across multiple failure zones. This configuration ensures higher resiliency for your ExpressRoute connection, maintaining connectivity between your on-premises network and Azure resources.
57+
58+
- **Standard** and **High-Performance** SKUs: **0%**.
59+
- **Ultra Performance** SKUs: **2%**.
60+
- **ErGW1Az, ErGW2Az, ErGW3Az** SKUs:
61+
- Zonal deployment: **8%**.
62+
- Zone-redundant deployment: **10%**.
63+
- **ErGWScale** SKU:
64+
- Up to four instances (two scale units): **8%**.
65+
- More than four instances: **10%**.
66+
67+
### <a name="recommendation"></a> Resiliency recommendation score
68+
69+
Advisor recommendations provide actionable insights to improve the reliability of your ExpressRoute connection. Implementing these recommendations can enhance the resiliency of your connection and ensure uninterrupted access to Azure resources.
70+
71+
If no advisory recommendations are provided, the resiliency score for this category is **10%**.
72+
73+
> [!NOTE]
74+
> Recommendations to deploy a zone redundant gateway or a multi-site ExpressRoute circuit are already factored into the overall resiliency index score. As a result, they don't affect the advisory recommendations score directly.
75+
76+
### <a name = "readiness"></a> Resiliency validation readiness test score
77+
78+
ExpressRoute maximum resiliency circuits are defined as a pair of two standard circuits configured in two different peering locations. Any extra circuits would further enhance the resiliency, but these circuits aren't scored. For the resiliency validation multiplier to take effect, you must run the [Resiliency Validation](resiliency-validation.md) test on both peering locations.
79+
80+
The following multipliers are applied to the route resiliency score based on the results of the Resiliency Validation test:
81+
82+
- Resiliency tests conducted within the last 30 days: multiplier of **4**.
83+
- Tests conducted 31–60 days ago: multiplier of **3**.
84+
- Tests conducted 61–90 days ago: multiplier of **2**.
85+
- Tests conducted over 90 days ago: multiplier of **1**.
86+
87+
> [!IMPORTANT]
88+
> If resiliency validation is completed for only one of the two peering locations, the multiplier applied to the route resiliency score is reduced by **half**.
89+
90+
The resiliency index score provides a comprehensive assessment of the reliability of your ExpressRoute connection. By understanding the key factors that influence this score, you can identify opportunities to enhance the resiliency of your connection. Implementing the recommendations and best practices outlined in this article help you strengthen your ExpressRoute setup, ensuring consistent and reliable connectivity between your on-premises network and Azure resources.
91+
92+
## Next steps
93+
94+
- Learn more about [ExpressRoute virtual network gateway](expressroute-about-virtual-network-gateways.md).
95+
- Learn about [Zone redundancy for ExpressRoute virtual network gateway](../vpn-gateway/about-zone-redundant-vnet-gateways.md?toc=%2Fazure%2Fexpressroute%2Ftoc.json).
Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
---
2+
title: Azure ExpressRoute Gateway Resiliency Validation (preview)
3+
description: This article helps you understand the Azure ExpressRoute Gateway Resiliency Validation feature and how to use it.
4+
services: expressroute
5+
author: duongau
6+
ms.service: azure-expressroute
7+
ms.topic: conceptual
8+
ms.date: 03/24/2025
9+
ms.author: duau
10+
ms.custom: ai-usage
11+
---
12+
13+
# Azure ExpressRoute Gateway Resiliency Validation (preview)
14+
15+
Ensuring uninterrupted connectivity to Azure workloads through ExpressRoute is essential for maintaining business continuity. We're committed to providing you with new capabilities to help maintain a resilient network. The *gateway resiliency validation* feature assesses how resilient your network is by testing a failure scenario and validating the failover mechanisms. By proactively testing your network resiliency, you can ensure that your workloads remain available and can recover quickly from disruptions.
16+
17+
Another key aspect of this feature is the ability to identify misconfigurations and provide insights about your ExpressRoute connections from the ExpressRoute gateway perspective. This proactive approach allows you to validate the network behavior before major changes are implemented while also ensuring that your network is prepared for unexpected events.
18+
19+
> [!IMPORTANT]
20+
> **Azure ExpressRoute Resiliency Validation** is currently in PREVIEW.
21+
> See the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.
22+
23+
## Key features
24+
25+
- **Simulate circuit failover** - Connections are disconnected temporarily from the gateway of interest to the selected ExpressRoute circuit to simulate a failover from one peering location to another.
26+
- **Route redundancy** - Insights into duplicate routes are provided for all prefixes received from the selected peering location.
27+
- **Traffic visualization** - Visualize traffic going through the ExpressRoute gateway and all connections to an ExpressRoute circuit during testing.
28+
- **Test history** - Detailed information of previously conducted tests.
29+
30+
### Common use cases
31+
32+
- Facilitate in identifying and solving potential problems within your network to enhance the overall reliability and resiliency of your network infrastructure.
33+
34+
- Essential for high availability and disaster recovery (HA/DR) procedures and migration validation. It ensures your systems are prepared for unplanned events and maintains seamless operations by validating maintenance behavior at the workload level.
35+
36+
- Serves as a prerequisite for migrating from one ExpressRoute peering location to another, ensuring network resiliency before implementing major changes.
37+
38+
### Limitations
39+
40+
- The Resiliency Validation feature is available only for ExpressRoute gateways connected to ExpressRoute circuits in at least two distinct peering locations.
41+
- The **Route List** tab can only be refreshed once per hour.
42+
- This feature isn't supported for Virtual WAN or ExpressRoute Metro.
43+
- You can't run the Resiliency Validation test if there are any ongoing tests or if any of the circuits are currently undergoing maintenance.
44+
45+
## Prerequisites
46+
47+
- To participate in the preview, contact the [**ExpressRoute PM**](mailto:[email protected]) team.
48+
- Ensure that you have an ExpressRoute circuit in at least two distinct peering locations and an ExpressRoute gateway connected to those circuits.
49+
50+
## Using the gateway resiliency validation
51+
52+
The gateway resiliency validation can be accessed from any ExpressRoute gateway resource by navigating to the **Monitoring** section in the left-hand menu.
53+
54+
:::image type="content" source="media/resiliency-validation/resiliency-validation.png" alt-text="Screenshot of the Resiliency Validation feature, accessible under the monitoring section in the left-hand menu of the ExpressRoute gateway resource.":::
55+
56+
The dashboard provides a detailed overview of all ExpressRoute circuits connected to the ExpressRoute virtual network gateway, categorized by peering location. It displays the most recent test status, the timestamp of the last test conducted, the results of the latest test, and an action button to initiate a new test.
57+
58+
> [!WARNING]
59+
> During the test, the ExpressRoute circuit disconnects from the ExpressRoute gateway, causing a temporary loss of connectivity for nonredundant routes. Ensure your routing policies are configured to support traffic failover.
60+
61+
### Starting the test
62+
63+
1. Navigate to the desired peering location and select the **Start new test** button.
64+
65+
1. Review the autopopulated configuration, which includes:
66+
67+
- Gateway name
68+
- Peering location
69+
- Route redundancy information
70+
- Traffic details
71+
- Status of all connections to the ExpressRoute gateway
72+
73+
1. Ensure that all critical routes are marked as redundant by reviewing the **Route List** tab.
74+
75+
:::image type="content" source="media/resiliency-validation/route-list.png" alt-text="Screenshot showing the Route List tab with details of redundant and nonredundant routes.":::
76+
77+
1. Confirm that the circuits listed on this page aren't undergoing maintenance by selecting the first checkbox.
78+
79+
1. Acknowledge that you reviewed the **Route List** tab and that all critical routes are marked as redundant by selecting the second checkbox.
80+
81+
1. Enter the name of the gateway to confirm that you're aware of the potential effect of the test on your network.
82+
83+
1. Select **Start Simulation** to initiate the test.
84+
85+
:::image type="content" source="media/resiliency-validation/start-test.png" alt-text="Screenshot showing the Resiliency Validation testing page.":::
86+
87+
1. The resiliency validation status shows as **In progress**.
88+
89+
### During the test
90+
91+
1. Navigate to the **Test Status** tab to validate connectivity to your Azure workloads through each redundant connection. Review the traffic flow graph for the ExpressRoute gateway, which displays the average bits per second traffic flow. The tab also provides ingress and egress traffic information for connected and disconnected peering locations.
92+
93+
:::image type="content" source="media/resiliency-validation/test-status.png" alt-text="Screenshot of the traffic flow graph for an ExpressRoute gateway and the traffic data on the connections to the gateway.":::
94+
95+
1. Validate connectivity from your on-premises network to your Azure workloads through the redundant connection by sending data packets. Tools like [iPerf](https://iperf.fr/) can be used for this purpose.
96+
97+
1. Select the **Stop Simulation** button to end the test. Confirm if the test was completed successfully when prompted.
98+
99+
1. Once confirmed, connectivity for all connections to the ExpressRoute gateway gets restored.
100+
101+
1. You can view the test result by selecting **View** under the *Test History* column on the dashboard for the selected peering location.
102+
103+
## Frequently asked questions
104+
105+
1. Can control the gateway validation tests other than the Azure portal?
106+
107+
Yes, you can use REST API to start and stop the Gateway resiliency validation tests.
108+
109+
2. What happens if I don't terminate a test?
110+
111+
The tests continue to run indefinitely.
112+
113+
3. What metrics or alerts are available to monitor during the test?
114+
115+
The purpose of configuring redundant connections is to ensure network resilience during outages. If a single circuit is utilized at more than 50% of its bandwidth, packet drops might occur. During validation tests, the **Test Status** tab helps monitor traffic through the connections. You should expect [alerts](monitor-expressroute.md#alerts) if they're configured, providing an opportunity to validate their effectiveness.
116+
117+
For more information, see [Circuit utilization](monitor-expressroute-reference.md#category-circuit-traffic) or [Connection traffic](monitor-expressroute-reference.md#category-traffic) for metrics you can set up alerts on.
118+
119+
4. Can I control traffic on demand using the gateway resiliency validation tool?
120+
121+
Yes, the gateway resiliency validation tool allows you to control traffic on demand. This is useful for testing different traffic scenarios and ensuring your network can handle various failovers. It can also be used to validate connectivity after successful site migrations before disconnecting the redundant circuit.
122+
123+
5. Are there specific Role-Based Access Controls (RBAC) policies for this feature?
124+
125+
Yes, there are specific RBAC policies to ensure that only authorized users with contributor access to the gateway can initiate downtime.
126+
127+
6. When can I run this feature in a Virtual WAN setup or other resiliency models?
128+
129+
For feedback or other requests, contact the [**ExpressRoute PM**](mailto:[email protected]).
130+
131+
7. Does this feature work with FastPath and Private Link?
132+
133+
For FastPath, although the data path bypasses the gateway, the gateway still manages control plane activities like route management. During a disconnect between the ExpressRoute circuit and the ExpressRoute gateway, routes are withdrawn from the gateway. However, connectivity for the failover connection to FastPath and Private Link is maintained during the failover.
134+
135+
8. Is packet loss expected during this activity?
136+
137+
During the failover simulation, a brief connectivity disruption occurs as BGP (Border Gateway Protocol) reestablishes. Performance tests using iPerf on TCP (Transmission Control Protocol) up to 500 Mbps show no packet loss. However, in a real outage scenario, some packet loss occurs until the traffic successfully fails over.
138+
139+
9. How long does it take to fail over?
140+
141+
Once the simulation start, it can take up to 15 seconds for the traffic to fail over.
142+
143+
## Next steps
144+
145+
- Learn more about the [ExpressRoute gateway](expressroute-about-virtual-network-gateways.md) and how to [monitor ExpressRoute circuits](monitor-expressroute.md).
146+
- Learn about [ExpressRoute Resiliency Insights](resiliency-insights.md).

0 commit comments

Comments
 (0)