Skip to content

Commit 766dc62

Browse files
Move content.
1 parent 7f8284e commit 766dc62

File tree

2 files changed

+312
-258
lines changed

2 files changed

+312
-258
lines changed

articles/expressroute/expressroute-monitoring-metrics-alerts.md

Lines changed: 0 additions & 258 deletions
Original file line numberDiff line numberDiff line change
@@ -39,264 +39,6 @@ Metrics explorer supports sum, maximum, minimum, average and count as [aggregati
3939
* Min: The smallest value captured during the aggregation interval.
4040
* Max: The largest value captured during the aggregation interval.
4141

42-
## Circuits metrics
43-
44-
### <a name = "circuitbandwidth"></a>Bits In and Out - Metrics across all peerings
45-
46-
Aggregation type: *Avg*
47-
48-
You can view metrics across all peerings on a given ExpressRoute circuit.
49-
50-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/ermetricspeering.jpg" alt-text="circuit metrics":::
51-
52-
### Bits In and Out - Metrics per peering
53-
54-
Aggregation type: *Avg*
55-
56-
You can view metrics for private, public, and Microsoft peering in bits/second.
57-
58-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/erpeeringmetrics.jpg" alt-text="metrics per peering":::
59-
60-
### <a name = "bgp"></a>BGP Availability - Split by Peer
61-
62-
Aggregation type: *Avg*
63-
64-
You can view near to real-time availability of BGP (Layer-3 connectivity) across peerings and peers (Primary and Secondary ExpressRoute routers). This dashboard shows the Primary BGP session status is up for private peering and the Second BGP session status is down for private peering.
65-
66-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/erBgpAvailabilityMetrics.jpg" alt-text="BGP availability per peer":::
67-
68-
>[!NOTE]
69-
>During maintenance between the Microsoft edge and core network, BGP availability will appear down even if the BGP session between the customer edge and Microsoft edge remains up. For information about maintenance between the Microsoft edge and core network, make sure to have your [maintenance alerts turned on and configured](./maintenance-alerts.md).
70-
>
71-
72-
### FastPath routes count (at circuit level)
73-
74-
Aggregation type: *Max*
75-
76-
This metric shows the number of FastPath routes configured on a circuit. Set an alert for when the number of FastPath routes on a circuit goes beyond the threshold limit. For more information, see [ExpressRoute FastPath limits](about-fastpath.md#ip-address-limits).
77-
78-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/fastpath-routes-count-circuit.png" alt-text="Screenshot of FastPath routes count at circuit level metric.":::
79-
80-
### <a name = "arp"></a>ARP Availability - Split by Peering
81-
82-
Aggregation type: *Avg*
83-
84-
You can view near to real-time availability of [ARP](./expressroute-troubleshooting-arp-resource-manager.md) (Layer-2 connectivity) across peerings and peers (Primary and Secondary ExpressRoute routers). This dashboard shows the Private Peering ARP session status is up across both peers, but down for Microsoft peering for both peers. The default aggregation (Average) was utilized across both peers.
85-
86-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/erArpAvailabilityMetrics.jpg" alt-text="ARP availability per peer":::
87-
88-
## ExpressRoute Direct Metrics
89-
90-
### <a name = "admin"></a>Admin State - Split by link
91-
92-
Aggregation type: *Avg*
93-
94-
You can view the Admin state for each link of the ExpressRoute Direct port pair. The Admin state represents if the physical port is on or off. This state is required to pass traffic across the ExpressRoute Direct connection.
95-
96-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/adminstate-per-link.jpg" alt-text="ER Direct admin state":::
97-
98-
### <a name = "directin"></a>Bits In Per Second - Split by link
99-
100-
Aggregation type: *Avg*
101-
102-
You can view the bits in per second across both links of the ExpressRoute Direct port pair. Monitor this dashboard to compare inbound bandwidth for both links.
103-
104-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/bits-in-per-second-per-link.jpg" alt-text="ER Direct bits in per second":::
105-
106-
### <a name = "directout"></a>Bits Out Per Second - Split by link
107-
108-
Aggregation type: *Avg*
109-
110-
You can also view the bits out per second across both links of the ExpressRoute Direct port pair. Monitor this dashboard to compare outbound bandwidth for both links.
111-
112-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/bits-out-per-second-per-link.jpg" alt-text="ER Direct bits out per second":::
113-
114-
### <a name = "line"></a>Line Protocol - Split by link
115-
116-
Aggregation type: *Avg*
117-
118-
You can view the line protocol across each link of the ExpressRoute Direct port pair. The Line Protocol indicates if the physical link is up and running over ExpressRoute Direct. Monitor this dashboard and set alerts to know when the physical connection goes down.
119-
120-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/line-protocol-per-link.jpg" alt-text="ER Direct line protocol":::
121-
122-
### <a name = "rxlight"></a>Rx Light Level - Split by link
123-
124-
Aggregation type: *Avg*
125-
126-
You can view the Rx light level (the light level that the ExpressRoute Direct port is **receiving**) for each port. Healthy Rx light levels generally fall within a range of -10 dBm to 0 dBm. Set alerts to be notified if the Rx light level falls outside of the healthy range.
127-
128-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/rxlight-level-per-link.jpg" alt-text="ER Direct line Rx Light Level":::
129-
130-
>[!NOTE]
131-
> ExpressRoute Direct connectivity is hosted across different device platforms. Some ExpressRoute Direct connections will support a split view for Rx light levels by lane. However, this is not supported on all deployments.
132-
>
133-
134-
### <a name = "txlight"></a>Tx Light Level - Split by link
135-
136-
Aggregation type: *Avg*
137-
138-
You can view the Tx light level (the light level that the ExpressRoute Direct port is **transmitting**) for each port. Healthy Tx light levels generally fall within a range of -10 dBm to 0 dBm. Set alerts to be notified if the Tx light level falls outside of the healthy range.
139-
140-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/txlight-level-per-link.jpg" alt-text="ER Direct line Tx Light Level":::
141-
142-
>[!NOTE]
143-
> ExpressRoute Direct connectivity is hosted across different device platforms. Some ExpressRoute Direct connections will support a split view for Tx light levels by lane. However, this is not supported on all deployments.
144-
>
145-
146-
### FastPath routes count (at port level)
147-
148-
Aggregation type: *Max*
149-
150-
This metric shows the number of FastPath routes configured on an ExpressRoute Direct port.
151-
152-
*Guidance:* Set an alert for when the number of FastPath routes on the port goes beyond the threshold limit. For more information, see [ExpressRoute FastPath limits](about-fastpath.md#ip-address-limits).
153-
154-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/fastpath-routes-count-port.png" alt-text="Screenshot of FastPath routes count at port level metric.":::
155-
156-
## ExpressRoute Virtual Network Gateway Metrics
157-
158-
Aggregation type: *Avg*
159-
160-
When you deploy an ExpressRoute gateway, Azure manages the compute and functions of your gateway. There are six gateway metrics available to you to better understand the performance of your gateway:
161-
162-
* Bits received per second
163-
* CPU Utilization
164-
* Packets per seconds
165-
* Count of routes advertised to peers
166-
* Count of routes learned from peers
167-
* Frequency of routes changed
168-
* Number of VMs in the virtual network
169-
* Active flows
170-
* Max flows created per second
171-
172-
We highly recommended you set alerts for each of these metrics so that you're aware of when your gateway could be seeing performance issues.
173-
174-
### <a name = "gwbits"></a>Bits received per second - Split by instance
175-
176-
Aggregation type: *Avg*
177-
178-
This metric captures inbound bandwidth utilization on the ExpressRoute virtual network gateway instances. Set an alert for how frequent the bandwidth utilization exceeds a certain threshold. If you need more bandwidth, increase the size of the ExpressRoute virtual network gateway.
179-
180-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/inbound-gateway.png" alt-text="Screenshot of inbound bit per second - split metrics.":::
181-
182-
### <a name = "cpu"></a>CPU Utilization - Split by instance
183-
184-
Aggregation type: *Avg*
185-
186-
You can view the CPU utilization of each gateway instance. The CPU utilization might spike briefly during routine host maintenance but prolong high CPU utilization could indicate your gateway is reaching a performance bottleneck. Increasing the size of the ExpressRoute gateway might resolve this issue. Set an alert for how frequent the CPU utilization exceeds a certain threshold.
187-
188-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/cpu-split.jpg" alt-text="Screenshot of CPU utilization - split metrics.":::
189-
190-
### <a name = "packets"></a>Packets Per Second - Split by instance
191-
192-
Aggregation type: *Avg*
193-
194-
This metric captures the number of inbound packets traversing the ExpressRoute gateway. You should expect to see a consistent stream of data here if your gateway is receiving traffic from your on-premises network. Set an alert for when the number of packets per second drops below a threshold indicating that your gateway is no longer receiving traffic.
195-
196-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/pps-split.jpg" alt-text="Screenshot of packets per second - split metrics.":::
197-
198-
### <a name = "advertisedroutes"></a>Count of Routes Advertised to Peer - Split by instance
199-
200-
Aggregation type: *Max*
201-
202-
This metric shows the number of routes the ExpressRoute gateway is advertising to the circuit. The address spaces might include virtual networks that are connected using virtual network peering and uses remote ExpressRoute gateway. You should expect the number of routes to remain consistent unless there are frequent changes to the virtual network address spaces. Set an alert for when the number of advertised routes drop below the threshold for the number of virtual network address spaces you're aware of.
203-
204-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/count-of-routes-advertised-to-peer.png" alt-text="Screenshot of count of routes advertised to peer.":::
205-
206-
### <a name = "learnedroutes"></a>Count of routes learned from peer - Split by instance
207-
208-
Aggregation type: *Max*
209-
210-
This metric shows the number of routes the ExpressRoute gateway is learning from peers connected to the ExpressRoute circuit. These routes can be either from another virtual network connected to the same circuit or learned from on-premises. Set an alert for when the number of learned routes drop below a certain threshold. This metric can indicate either the gateway is seeing a performance problem or remote peers are no longer advertising routes to the ExpressRoute circuit.
211-
212-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/count-of-routes-learned-from-peer.png" alt-text="Screenshot of count of routes learned from peer.":::
213-
214-
### <a name = "frequency"></a>Frequency of routes change - Split by instance
215-
216-
Aggregation type: *Sum*
217-
218-
This metric shows the frequency of routes being learned from or advertised to remote peers. You should first investigate your on-premises devices to understand why the network is changing so frequently. A high frequency in routes change could indicate a performance problem on the ExpressRoute gateway where scaling the gateway SKU up might resolve the problem. Set an alert for a frequency threshold to be aware of when your ExpressRoute gateway is seeing abnormal route changes.
219-
220-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/frequency-of-routes-changed.png" alt-text="Screenshot of frequency of routes changed metric.":::
221-
222-
### <a name = "vm"></a>Number of VMs in the virtual network
223-
224-
Aggregation type: *Max*
225-
226-
This metric shows the number of virtual machines that are using the ExpressRoute gateway. The number of virtual machines might include VMs from peered virtual networks that use the same ExpressRoute gateway. Set an alert for this metric if the number of VMs goes above a certain threshold that could affect the gateway performance.
227-
228-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/number-of-virtual-machines-virtual-network.png" alt-text="Screenshot of number of virtual machines in the virtual network metric.":::
229-
230-
>[!NOTE]
231-
> To maintain reliability of the service, Microsoft often performs platform or OS maintenance on the gateway service. During this time, this metric may fluctuate and report inaccurately.
232-
>
233-
234-
## <a name = "activeflows"></a>Active flows
235-
236-
Aggregation type: *Avg*
237-
238-
Split by: Gateway Instance
239-
240-
241-
This metric displays a count of the total number of active flows on the ExpressRoute Gateway. Only inbound traffic from on-premises is captured for active flows. Through split at instance level, you can see active flow count per gateway instance. For more information, see [understand network flow limits](../virtual-network/virtual-machine-network-throughput.md#network-flow-limits).
242-
243-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/active-flows.png" alt-text="Screenshot of number of active flows per second metrics dashboard.":::
244-
245-
## <a name = "maxflows"></a>Max flows created per second
246-
247-
Aggregation type: *Max*
248-
249-
Split by: Gateway Instance and Direction (Inbound/Outbound)
250-
251-
This metric displays the maximum number of flows created per second on the ExpressRoute Gateway. Through split at instance level and direction, you can see max flow creation rate per gateway instance and inbound/outbound direction respectively. For more information, see [understand network flow limits](../virtual-network/virtual-machine-network-throughput.md#network-flow-limits).
252-
253-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/max-flows-per-second.png" alt-text="Screenshot of the maximum number of flows created per second metrics dashboard.":::
254-
255-
## <a name = "connectionbandwidth"></a>ExpressRoute gateway connections in bits/seconds
256-
257-
Aggregation type: *Avg*
258-
259-
This metric shows the bits per second for ingress and egress to Azure through the ExpressRoute gateway. You can split this metric further to see specific connections to the ExpressRoute circuit.
260-
261-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/erconnections.jpg" alt-text="Screenshot of gateway connection bandwidth usage metric.":::
262-
263-
## ExpressRoute Traffic Collector metrics
264-
265-
### CPU Utilization - Split by instance
266-
267-
Aggregation type: *Avg* (of percentage of total utilized CPU)
268-
269-
*Granularity: 5 min*
270-
271-
You can view the CPU utilization of each ExpressRoute Traffic Collector instance. The CPU utilization might spike briefly during routine host maintenance, but prolonged high CPU utilization could indicate your ExpressRoute Traffic Collector is reaching a performance bottleneck.
272-
273-
**Guidance:** Set an alert for when avg CPU utilization exceeds a certain threshold.
274-
275-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/cpu-usage.png" alt-text="Screenshot of CPU usage for ExpressRoute Traffic Collector." lightbox="./media/expressroute-monitoring-metrics-alerts/cpu-usage.png":::
276-
277-
### Memory Utilization - Split by instance
278-
279-
Aggregation type: *Avg* (of percentage of total utilized Memory)
280-
281-
*Granularity: 5 min*
282-
283-
You can view the memory utilization of each ExpressRoute Traffic Collector instance. Memory utilization might spike briefly during routine host maintenance, but prolonged high memory utilization could indicate your Azure Traffic Collector is reaching a performance bottleneck.
284-
285-
**Guidance:** Set an alert for when avg memory utilization exceeds a certain threshold.
286-
287-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/memory-usage.png" alt-text="Screenshot of memory usage for ExpressRoute Traffic Collector." lightbox="./media/expressroute-monitoring-metrics-alerts/memory-usage.png":::
288-
289-
### Count of flow records processed - Split by instances or ExpressRoute circuit
290-
291-
Aggregation type: *Count*
292-
293-
*Granularity: 5 min*
294-
295-
You can view the count of number of flow records processed by ExpressRoute Traffic Collector, aggregated across ExpressRoute Circuits. Customer can split the metrics across each ExpressRoute Traffic Collector instance or ExpressRoute circuit when multiple circuits are associated to the ExpressRoute Traffic Collector. Monitoring this metric helps you understand if you need to deploy more ExpressRoute Traffic Collector instances or migrate ExpressRoute circuit association from one ExpressRoute Traffic Collector deployment to another.
296-
297-
**Guidance:** Splitting by circuits is recommended when multiple ExpressRoute circuits are associated with an ExpressRoute Traffic Collector deployment. This metric helps determine the flow count of each ExpressRoute circuit and ExpressRoute Traffic Collector utilization by each ExpressRoute circuit.
298-
299-
:::image type="content" source="./media/expressroute-monitoring-metrics-alerts/flow-records.png" alt-text="Screenshot of average flow records for an ExpressRoute circuit." lightbox="./media/expressroute-monitoring-metrics-alerts/flow-records.png":::
30042

30143
## Alerts for ExpressRoute gateway connections
30244

0 commit comments

Comments
 (0)