Skip to content

Commit 4b45ea2

Browse files
authored
Merge pull request #95134 from vinynigam/patch-48
Updating schema and adding queries for alerts
2 parents 2c4aa55 + 49e4951 commit 4b45ea2

File tree

1 file changed

+55
-12
lines changed

1 file changed

+55
-12
lines changed

articles/azure-monitor/insights/network-performance-monitor-faq.md

Lines changed: 55 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -27,14 +27,14 @@ More information on the various capabilities supported by [Network Performance M
2727
### What are the platform requirements for the nodes to be used for monitoring by NPM?
2828
Listed below are the platform requirements for NPM's various capabilities:
2929

30-
- NPM's Performance Monitor and Service Connectivity Monitor capabilities support both Windows server and Windows desktops/client operating systems. Windows server OS versions supported are 2008 SP1 or later. Windows desktops/client versions supported are Windows 10, Windows 8.1, Windows 8 and Windows 7.
30+
- NPM's Performance Monitor and Service Connectivity Monitor capabilities support both Windows server and Windows desktops/client operating systems. Windows server OS versions supported are 2008 SP1 or later. Windows desktops/client versions supported are Windows 10, Windows 8.1, Windows 8, and Windows 7.
3131
- NPM's ExpressRoute Monitor capability supports only Windows server (2008 SP1 or later) operating system.
3232

3333
### Can I use Linux machines as monitoring nodes in NPM?
3434
The capability to monitor networks using Linux-based nodes is currently in preview. Reach out to your Account Manager to know more. Linux agents provide monitoring capability only for NPM's Performance Monitor capability, and are not available for the Service Connectivity Monitor and ExpressRoute Monitor capabilities
3535

3636
### What are the size requirements of the nodes to be used for monitoring by NPM?
37-
For running the NPM solution on node VMs to monitor networks, the nodes should have at least 500-MB memory and one core. You do'nt need to use separate nodes for running NPM. The solution can run on nodes that have other workloads running on it. The solution has the capability to stop the monitoring process if it uses more than 5% CPU.
37+
For running the NPM solution on node VMs to monitor networks, the nodes should have at least 500-MB memory and one core. You don't need to use separate nodes for running NPM. The solution can run on nodes that have other workloads running on it. The solution has the capability to stop the monitoring process if it uses more than 5% CPU.
3838

3939
### To use NPM, should I connect my nodes as Direct agent or through System Center Operations Manager?
4040
Both the Performance Monitor and the Service Connectivity Monitor capabilities support nodes [connected as Direct Agents](../../azure-monitor/platform/agent-windows.md) and [connected through Operations Manager](../../azure-monitor/platform/om-agents.md).
@@ -44,7 +44,7 @@ For ExpressRoute Monitor capability, the Azure nodes should be connected as Dire
4444
### Which protocol among TCP and ICMP should be chosen for monitoring?
4545
If you're monitoring your network using Windows server-based nodes, we recommend you use TCP as the monitoring protocol since it provides better accuracy.
4646

47-
ICMP is recommended for Windows desktops/client operating system-based nodes. This platform does'nt allow TCP data to be sent over raw sockets, which NPM uses to discover network topology.
47+
ICMP is recommended for Windows desktops/client operating system-based nodes. This platform doesn't allow TCP data to be sent over raw sockets, which NPM uses to discover network topology.
4848

4949
You can get more details on the relative advantages of each protocol [here](../../azure-monitor/insights/network-performance-monitor-performance-monitor.md#choose-the-protocol).
5050

@@ -94,6 +94,43 @@ NPM uses a probabilistic mechanism to assign fault-probabilities to each network
9494
### How can I create alerts in NPM?
9595
Refer to [alerts section in the documentation](https://docs.microsoft.com/azure/log-analytics/log-analytics-network-performance-monitor#alerts) for step-by-step instructions.
9696

97+
### What are the default Log Analytics queries for alerts
98+
Performance monitor query
99+
100+
NetworkMonitoring
101+
| where (SubType == "SubNetwork" or SubType == "NetworkPath")
102+
| where (LossHealthState == "Unhealthy" or LatencyHealthState == "Unhealthy") and RuleName == "<<your rule name>>"
103+
104+
Service connectivity monitor query
105+
106+
NetworkMonitoring
107+
| where (SubType == "EndpointHealth" or SubType == "EndpointPath")
108+
| where (LossHealthState == "Unhealthy" or LatencyHealthState == "Unhealthy" or ServiceResponseHealthState == "Unhealthy" or LatencyHealthState == "Unhealthy") and TestName == "<<your test name>>"
109+
110+
ExpressRoute monitor queries:
111+
Circuits query
112+
113+
NetworkMonitoring
114+
| where (SubType == "ERCircuitTotalUtilization") and (UtilizationHealthState == "Unhealthy") and CircuitResourceId == "<<your circuit resource ID>>"
115+
116+
Private peering
117+
118+
NetworkMonitoring
119+
| where (SubType == "ExpressRoutePeering" or SubType == "ERVNetConnectionUtilization" or SubType == "ExpressRoutePath")
120+
| where (LossHealthState == "Unhealthy" or LatencyHealthState == "Unhealthy" or UtilizationHealthState == "Unhealthy") and CircuitName == "<<your circuit name>>" and VirtualNetwork == "<<vnet name>>"
121+
122+
Microsoft peering
123+
124+
NetworkMonitoring
125+
| where (SubType == "ExpressRoutePeering" or SubType == "ERMSPeeringUtilization" or SubType == "ExpressRoutePath")
126+
| where (LossHealthState == "Unhealthy" or LatencyHealthState == "Unhealthy" or UtilizationHealthState == "Unhealthy") and CircuitName == ""<<your circuit name>>" and PeeringType == "MicrosoftPeering"
127+
128+
Common query
129+
130+
NetworkMonitoring
131+
| where (SubType == "ExpressRoutePeering" or SubType == "ERVNetConnectionUtilization" or SubType == "ERMSPeeringUtilization" or SubType == "ExpressRoutePath")
132+
| where (LossHealthState == "Unhealthy" or LatencyHealthState == "Unhealthy" or UtilizationHealthState == "Unhealthy")
133+
97134
### Can NPM monitor routers and servers as individual devices?
98135
NPM only identifies the IP and host name of underlying network hops (switches, routers, servers, etc.) between the source and destination IPs. It also identifies the latency between these identified hops. It does not individually monitor these underlying hops.
99136

@@ -106,17 +143,23 @@ Bandwidth usage is the total of incoming and outgoing bandwidth. It is expressed
106143
### Can we get incoming and outgoing bandwidth information for the ExpressRoute?
107144
Incoming and outgoing values for both Primary and Secondary bandwidth can be captured.
108145

109-
For peering level information, use the below mentioned query in Log Search
146+
For MS peering level information, use the below mentioned query in Log Search
147+
148+
NetworkMonitoring
149+
| where SubType == "ERMSPeeringUtilization"
150+
| project CircuitName,PeeringName,PrimaryBytesInPerSecond,PrimaryBytesOutPerSecond,SecondaryBytesInPerSecond,SecondaryBytesOutPerSecond
151+
152+
For private peering level information, use the below mentioned query in Log Search
110153

111154
NetworkMonitoring
112-
| where SubType == "ExpressRoutePeeringUtilization"
113-
| project CircuitName,PeeringName,PrimaryBytesInPerSecond,PrimaryBytesOutPerSecond,SecondaryBytesInPerSecond,SecondaryBytesOutPerSecond
155+
| where SubType == "ERVNetConnectionUtilization"
156+
| project CircuitName,PeeringName,PrimaryBytesInPerSecond,PrimaryBytesOutPerSecond,SecondaryBytesInPerSecond,SecondaryBytesOutPerSecond
114157

115-
For circuit level information, use the below mentioned query
158+
For circuit level information, use the below mentioned query in Log Search
116159

117160
NetworkMonitoring
118-
| where SubType == "ExpressRouteCircuitUtilization"
119-
| project CircuitName,PrimaryBytesInPerSecond, PrimaryBytesOutPerSecond,SecondaryBytesInPerSecond,SecondaryBytesOutPerSecond
161+
| where SubType == "ERCircuitTotalUtilization"
162+
| project CircuitName, PrimaryBytesInPerSecond, PrimaryBytesOutPerSecond,SecondaryBytesInPerSecond,SecondaryBytesOutPerSecond
120163

121164
### Which regions are supported for NPM's Performance Monitor?
122165
NPM can monitor connectivity between networks in any part of the world, from a workspace that is hosted in one of the [supported regions](../../azure-monitor/insights/network-performance-monitor.md#supported-regions)
@@ -140,8 +183,8 @@ A hop may not respond to a traceroute in one or more of the below scenarios:
140183
* The network devices are not allowing ICMP_TTL_EXCEEDED traffic.
141184
* A firewall is blocking the ICMP_TTL_EXCEEDED response from the network device.
142185

143-
### I get alerts for unhealthy tests but I do not see the high values in NPM's loss and latency graph. How do I check what is unhealthy ?
144-
NPM raises an alert if end to end latency between source and destination crosses the threshhold for any path between them. Some networks have more than one paths connecting the same source and destination. NPM raises an alert is any path is unhealthy. The loss and latency seen in the graphs is the average value for all the paths, hence it may not show the exact value of a single path. To understand where the threshold has been breached, look for the "SubType" column in the alert. If the issue is caused by a path the SubType value will be NetworkPath ( for Performance Monitor tests), EndpointPath (for Service Connectivity Monitor tests) and ExpressRoutePath (for ExpressRotue Monitor tests).
186+
### I get alerts for unhealthy tests but I do not see the high values in NPM's loss and latency graph. How do I check what is unhealthy?
187+
NPM raises an alert if end to end latency between source and destination crosses the threshold for any path between them. Some networks have multiple paths connecting the same source and destination. NPM raises an alert is any path is unhealthy. The loss and latency seen in the graphs is the average value for all the paths, hence it may not show the exact value of a single path. To understand where the threshold has been breached, look for the "SubType" column in the alert. If the issue is caused by a path the SubType value will be NetworkPath (for Performance Monitor tests), EndpointPath (for Service Connectivity Monitor tests) and ExpressRoutePath (for ExpressRotue Monitor tests).
145188

146189
Sample Query to find is path is unhealthy:
147190

@@ -151,7 +194,7 @@ Sample Query to find is path is unhealthy:
151194
| project SubType, LossHealthState, LatencyHealthState, MedianLatency
152195

153196
### Why does my test show unhealthy but the topology does not
154-
NPM monitors end-to-end loss, latency, and topology at different intervals. Loss and latency are measured once every 5 seconds and aggregated every three minutes (for Performance Monitor and Express Route Monitor) while topology is calculated using traceroute once every 10 minutes. For example, between 3:44 and 4:04, topology may be updated three times (3:44, 3:54, 4:04) , but loss and latency are updated about seven times (3:44, 3:47, 3:50, 3:53, 3:56, 3:59, 4:02). The topology generated at 3:54 will be rendered for the loss and latency that gets calculated at 3:56, 3:59 and 4:02. Suppose you get an alert that your ER circuit was unhealthy at 3:59. You log on to NPM and try to set the topology time to 3:59. NPM will render the topology generated at 3:54. To understand the last known topology of your network, compare the fields TimeProcessed (time at which loss and latency was calculated) and TracerouteCompletedTime(time at which topology was calculated).
197+
NPM monitors end-to-end loss, latency, and topology at different intervals. Loss and latency are measured once every 5 seconds and aggregated every three minutes (for Performance Monitor and Express Route Monitor) while topology is calculated using traceroute once every 10 minutes. For example, between 3:44 and 4:04, topology may be updated three times (3:44, 3:54, 4:04), but loss and latency are updated about seven times (3:44, 3:47, 3:50, 3:53, 3:56, 3:59, 4:02). The topology generated at 3:54 will be rendered for the loss and latency that gets calculated at 3:56, 3:59 and 4:02. Suppose you get an alert that your ER circuit was unhealthy at 3:59. You log on to NPM and try to set the topology time to 3:59. NPM will render the topology generated at 3:54. To understand the last known topology of your network, compare the fields TimeProcessed (time at which loss and latency was calculated) and TracerouteCompletedTime(time at which topology was calculated).
155198

156199
### What is the difference between the fields E2EMedianLatency and AvgHopLatencyList in the NetworkMonitoring table
157200
E2EMedianLatency is the latency updated every three minutes after aggregating the results of tcp ping tests, whereas AvgHopLatencyList is updated every 10 mins based on traceroute. To understand the exact time at which E2EMedianLatency was calculated, use the field TimeProcessed. To understand the exact time at which traceroute completed and updated AvgHopLatencyList, use the field TracerouteCompletedTime

0 commit comments

Comments
 (0)