Skip to content

Commit d7a001c

Browse files
committed
merge troubleshooting content to dynamic threshold
1 parent 124a9eb commit d7a001c

File tree

2 files changed

+118
-117
lines changed

2 files changed

+118
-117
lines changed

articles/azure-monitor/alerts/alerts-dynamic-thresholds.md

Lines changed: 118 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -97,13 +97,129 @@ If you have a new resource or missing metric data, Dynamic Thresholds won't trig
9797

9898
The system automatically recognizes prolonged outages and removes them from the threshold learning algorithm. As a result, despite prolonged outages, dynamic thresholds understand the data. Service issues are detected with the same sensitivity as before an outage occurred.
9999

100+
## The Dynamic Thresholds borders don't seem to fit the data
101+
102+
If the behavior of a metric changed recently, the changes won't necessarily be reflected in the Dynamic Threshold borders (upper and lower bounds) immediately. The borders are calculated based on metric data from the last 10 days. When you view the Dynamic Threshold borders for a given metric, look at the metric trend in the last week and not only for recent hours or days.
103+
104+
## Why is weekly seasonality not detected by Dynamic Thresholds?
105+
106+
To identify weekly seasonality, the Dynamic Thresholds model requires at least three weeks of historical data. When enough historical data is available, any weekly seasonality that exists in the metric data is identified and the model is adjusted accordingly.
107+
108+
## Dynamic Thresholds shows a negative lower bound for a metric even though the metric always has positive values
109+
110+
When a metric exhibits large fluctuation, Dynamic Thresholds builds a wider model around the metric values. This action can result in the lower border being below zero. Specifically, this scenario can happen when:
111+
112+
- The sensitivity is set to low.
113+
- The median values are close to zero.
114+
- The metric exhibits an irregular behavior with high variance, which appears as spikes or dips in the data.
115+
116+
When the lower bound has a negative value, it's plausible for the metric to reach a zero value given the metric's irregular behavior. Consider choosing a higher sensitivity or a larger **Aggregation granularity (Period)** to make the model less sensitive. Or, use the **Ignore data before** option to exclude a recent irregularity from the historical data used to build the model.
117+
118+
## The Dynamic Thresholds alert rule is too noisy or fires too much
119+
120+
To reduce the sensitivity of your Dynamic Thresholds alert rule, use one of the following options:
121+
122+
- **Threshold sensitivity:** Set the sensitivity to **Low** to be more tolerant for deviations.
123+
- **Number of violations (under Advanced settings):** Configure the alert rule to trigger only if several deviations occur within a certain period of time. This setting makes the rule less susceptible to transient deviations.
124+
125+
## The Dynamic Thresholds alert rule doesn't fire or is not sensitive enough
126+
127+
Sometimes an alert rule won't trigger, even when a high sensitivity is configured. This scenario usually happens when the metric's distribution is highly irregular.
128+
Consider one of the following options:
129+
130+
* Move to monitoring a complementary metric that's suitable for your scenario, if applicable. For example, check for changes in success rate rather than failure rate.
131+
* Try selecting a different value for **Aggregation granularity (Period)**.
132+
* Check if there was a drastic change in the metric behavior in the last 10 days, for example, an outage. An abrupt change can affect the upper and lower thresholds calculated for the metric and make them broader. Wait for a few days until the outage is no longer taken into the thresholds calculation. Or use the **Ignore data before** option under **Advanced settings**.
133+
* If your data has weekly seasonality, but not enough history is available for the metric, the calculated thresholds can result in having broad upper and lower bounds. For example, the calculation can treat weekdays and weekends in the same way and build wide borders that don't always fit the data. This issue should resolve itself after enough metric history is available. Then, the correct seasonality will be detected and the calculated thresholds will update accordingly.
134+
135+
## Metrics not supported by Dynamic Thresholds
136+
137+
Dynamic thresholds are supported for most metrics, but some metrics can't use dynamic thresholds.
138+
139+
The following table lists the metrics that aren't supported by Dynamic Thresholds.
140+
141+
| Resource type | Metric name |
142+
| --- | --- |
143+
| Microsoft.ClassicStorage/storageAccounts | UsedCapacity |
144+
| Microsoft.ClassicStorage/storageAccounts/blobServices | BlobCapacity |
145+
| Microsoft.ClassicStorage/storageAccounts/blobServices | BlobCount |
146+
| Microsoft.ClassicStorage/storageAccounts/blobServices | IndexCapacity |
147+
| Microsoft.ClassicStorage/storageAccounts/fileServices | FileCapacity |
148+
| Microsoft.ClassicStorage/storageAccounts/fileServices | FileCount |
149+
| Microsoft.ClassicStorage/storageAccounts/fileServices | FileShareCount |
150+
| Microsoft.ClassicStorage/storageAccounts/fileServices | FileShareSnapshotCount |
151+
| Microsoft.ClassicStorage/storageAccounts/fileServices | FileShareSnapshotSize |
152+
| Microsoft.ClassicStorage/storageAccounts/fileServices | FileShareQuota |
153+
| Microsoft.Compute/disks | Composite Disk Read Bytes/sec |
154+
| Microsoft.Compute/disks | Composite Disk Read Operations/sec |
155+
| Microsoft.Compute/disks | Composite Disk Write Bytes/sec |
156+
| Microsoft.Compute/disks | Composite Disk Write Operations/sec |
157+
| Microsoft.ContainerService/managedClusters | NodesCount |
158+
| Microsoft.ContainerService/managedClusters | PodCount |
159+
| Microsoft.ContainerService/managedClusters | CompletedJobsCount |
160+
| Microsoft.ContainerService/managedClusters | RestartingContainerCount |
161+
| Microsoft.ContainerService/managedClusters | OomKilledContainerCount |
162+
| Microsoft.Devices/IotHubs | TotalDeviceCount |
163+
| Microsoft.Devices/IotHubs | ConnectedDeviceCount |
164+
| Microsoft.Devices/IotHubs | TotalDeviceCount |
165+
| Microsoft.Devices/IotHubs | ConnectedDeviceCount |
166+
| Microsoft.DocumentDB/databaseAccounts | CassandraConnectionClosures |
167+
| Microsoft.EventHub/clusters | Size |
168+
| Microsoft.EventHub/namespaces | Size |
169+
| Microsoft.IoTCentral/IoTApps | connectedDeviceCount |
170+
| Microsoft.IoTCentral/IoTApps | provisionedDeviceCount |
171+
| Microsoft.Kubernetes/connectedClusters | NodesCount |
172+
| Microsoft.Kubernetes/connectedClusters | PodCount |
173+
| Microsoft.Kubernetes/connectedClusters | CompletedJobsCount |
174+
| Microsoft.Kubernetes/connectedClusters | RestartingContainerCount |
175+
| Microsoft.Kubernetes/connectedClusters | OomKilledContainerCount |
176+
| Microsoft.MachineLearningServices/workspaces/onlineEndpoints | RequestsPerMinute |
177+
| Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments | DeploymentCapacity |
178+
| Microsoft.Maps/accounts | CreatorUsage |
179+
| Microsoft.Media/mediaservices/streamingEndpoints | EgressBandwidth |
180+
| Microsoft.Network/applicationGateways | Throughput |
181+
| Microsoft.Network/azureFirewalls | Throughput |
182+
| Microsoft.Network/expressRouteGateways | ExpressRouteGatewayPacketsPerSecond |
183+
| Microsoft.Network/expressRouteGateways | ExpressRouteGatewayNumberOfVmInVnet |
184+
| Microsoft.Network/expressRouteGateways | ExpressRouteGatewayFrequencyOfRoutesChanged |
185+
| Microsoft.Network/virtualNetworkGateways | ExpressRouteGatewayBitsPerSecond |
186+
| Microsoft.Network/virtualNetworkGateways | ExpressRouteGatewayPacketsPerSecond |
187+
| Microsoft.Network/virtualNetworkGateways | ExpressRouteGatewayNumberOfVmInVnet |
188+
| Microsoft.Network/virtualNetworkGateways | ExpressRouteGatewayFrequencyOfRoutesChanged |
189+
| Microsoft.ServiceBus/namespaces | Size |
190+
| Microsoft.ServiceBus/namespaces | Messages |
191+
| Microsoft.ServiceBus/namespaces | ActiveMessages |
192+
| Microsoft.ServiceBus/namespaces | DeadletteredMessages |
193+
| Microsoft.ServiceBus/namespaces | ScheduledMessages |
194+
| Microsoft.ServiceFabricMesh/applications | AllocatedCpu |
195+
| Microsoft.ServiceFabricMesh/applications | AllocatedMemory |
196+
| Microsoft.ServiceFabricMesh/applications | ActualCpu |
197+
| Microsoft.ServiceFabricMesh/applications | ActualMemory |
198+
| Microsoft.ServiceFabricMesh/applications | ApplicationStatus |
199+
| Microsoft.ServiceFabricMesh/applications | ServiceStatus |
200+
| Microsoft.ServiceFabricMesh/applications | ServiceReplicaStatus |
201+
| Microsoft.ServiceFabricMesh/applications | ContainerStatus |
202+
| Microsoft.ServiceFabricMesh/applications | RestartCount |
203+
| Microsoft.Storage/storageAccounts | UsedCapacity |
204+
| Microsoft.Storage/storageAccounts/blobServices | BlobCapacity |
205+
| Microsoft.Storage/storageAccounts/blobServices | BlobCount |
206+
| Microsoft.Storage/storageAccounts/blobServices | BlobProvisionedSize |
207+
| Microsoft.Storage/storageAccounts/blobServices | IndexCapacity |
208+
| Microsoft.Storage/storageAccounts/fileServices | FileCapacity |
209+
| Microsoft.Storage/storageAccounts/fileServices | FileCount |
210+
| Microsoft.Storage/storageAccounts/fileServices | FileShareCount |
211+
| Microsoft.Storage/storageAccounts/fileServices | FileShareSnapshotCount |
212+
| Microsoft.Storage/storageAccounts/fileServices | FileShareSnapshotSize |
213+
| Microsoft.Storage/storageAccounts/fileServices | FileShareCapacityQuota |
214+
| Microsoft.Storage/storageAccounts/fileServices | FileShareProvisionedIOPS |
215+
100216
## Dynamic Thresholds best practices
101217

102218
Dynamic Thresholds can be applied to most platform and custom metrics in Azure Monitor, and it was also tuned for the common application and infrastructure metrics.
103219

104220
The following items are best practices on how to configure alerts on some of these metrics by using Dynamic Thresholds.
105221

106-
### Configure dynamic thresholds on virtual machine CPU percentage metrics
222+
## Configure dynamic thresholds on virtual machine CPU percentage metrics
107223

108224
1. In the [Azure portal](https://portal.azure.com), select **Monitor**. The **Monitor** view consolidates all your monitoring settings and data in one view.
109225

@@ -140,7 +256,7 @@ The following items are best practices on how to configure alerts on some of the
140256
> [!NOTE]
141257
> Metric alert rules created through the portal are created in the same resource group as the target resource.
142258
143-
### Configure dynamic thresholds on Application Insights HTTP request execution time
259+
## Configure dynamic thresholds on Application Insights HTTP request execution time
144260

145261
1. In the [Azure portal](https://portal.azure.com), select **Monitor**. The **Monitor** view consolidates all your monitoring settings and data in one view.
146262

articles/azure-monitor/alerts/alerts-troubleshoot-metric.md

Lines changed: 0 additions & 115 deletions
Original file line numberDiff line numberDiff line change
@@ -299,121 +299,6 @@ Choose an **Aggregation granularity (Period)** that's larger than the **Frequenc
299299
- **Metric alert rule that monitors multiple resources:** When a new resource is added to the scope.
300300
- **Metric alert rule that monitors a metric that isn't emitted continuously (sparse metric):** When the metric is emitted after a period longer than 24 hours in which it wasn't emitted.
301301

302-
## The Dynamic Thresholds borders don't seem to fit the data
303-
304-
If the behavior of a metric changed recently, the changes won't necessarily be reflected in the Dynamic Threshold borders (upper and lower bounds) immediately. The borders are calculated based on metric data from the last 10 days. When you view the Dynamic Threshold borders for a given metric, look at the metric trend in the last week and not only for recent hours or days.
305-
306-
## Why is weekly seasonality not detected by Dynamic Thresholds?
307-
308-
To identify weekly seasonality, the Dynamic Thresholds model requires at least three weeks of historical data. When enough historical data is available, any weekly seasonality that exists in the metric data is identified and the model is adjusted accordingly.
309-
310-
## Dynamic Thresholds shows a negative lower bound for a metric even though the metric always has positive values
311-
312-
When a metric exhibits large fluctuation, Dynamic Thresholds builds a wider model around the metric values. This action can result in the lower border being below zero. Specifically, this scenario can happen when:
313-
314-
- The sensitivity is set to low.
315-
- The median values are close to zero.
316-
- The metric exhibits an irregular behavior with high variance, which appears as spikes or dips in the data.
317-
318-
When the lower bound has a negative value, it's plausible for the metric to reach a zero value given the metric's irregular behavior. Consider choosing a higher sensitivity or a larger **Aggregation granularity (Period)** to make the model less sensitive. Or, use the **Ignore data before** option to exclude a recent irregularity from the historical data used to build the model.
319-
320-
## The Dynamic Thresholds alert rule is too noisy (fires too much)
321-
322-
To reduce the sensitivity of your Dynamic Thresholds alert rule, use one of the following options:
323-
324-
- **Threshold sensitivity:** Set the sensitivity to **Low** to be more tolerant for deviations.
325-
- **Number of violations (under Advanced settings):** Configure the alert rule to trigger only if several deviations occur within a certain period of time. This setting makes the rule less susceptible to transient deviations.
326-
327-
## The Dynamic Thresholds alert rule is too insensitive (doesn't fire)
328-
329-
Sometimes an alert rule won't trigger, even when a high sensitivity is configured. This scenario usually happens when the metric's distribution is highly irregular.
330-
Consider one of the following options:
331-
332-
* Move to monitoring a complementary metric that's suitable for your scenario, if applicable. For example, check for changes in success rate rather than failure rate.
333-
* Try selecting a different value for **Aggregation granularity (Period)**.
334-
* Check if there was a drastic change in the metric behavior in the last 10 days, for example, an outage. An abrupt change can affect the upper and lower thresholds calculated for the metric and make them broader. Wait for a few days until the outage is no longer taken into the thresholds calculation. Or use the **Ignore data before** option under **Advanced settings**.
335-
* If your data has weekly seasonality, but not enough history is available for the metric, the calculated thresholds can result in having broad upper and lower bounds. For example, the calculation can treat weekdays and weekends in the same way and build wide borders that don't always fit the data. This issue should resolve itself after enough metric history is available. Then, the correct seasonality will be detected and the calculated thresholds will update accordingly.
336-
337-
## When I configure an alert rule's condition, why is Dynamic Thresholds disabled?
338-
339-
Dynamic thresholds are supported for most metrics, but some metrics can't use dynamic thresholds.
340-
341-
The following table lists the metrics that aren't supported by Dynamic Thresholds.
342-
343-
| Resource type | Metric name |
344-
| --- | --- |
345-
| Microsoft.ClassicStorage/storageAccounts | UsedCapacity |
346-
| Microsoft.ClassicStorage/storageAccounts/blobServices | BlobCapacity |
347-
| Microsoft.ClassicStorage/storageAccounts/blobServices | BlobCount |
348-
| Microsoft.ClassicStorage/storageAccounts/blobServices | IndexCapacity |
349-
| Microsoft.ClassicStorage/storageAccounts/fileServices | FileCapacity |
350-
| Microsoft.ClassicStorage/storageAccounts/fileServices | FileCount |
351-
| Microsoft.ClassicStorage/storageAccounts/fileServices | FileShareCount |
352-
| Microsoft.ClassicStorage/storageAccounts/fileServices | FileShareSnapshotCount |
353-
| Microsoft.ClassicStorage/storageAccounts/fileServices | FileShareSnapshotSize |
354-
| Microsoft.ClassicStorage/storageAccounts/fileServices | FileShareQuota |
355-
| Microsoft.Compute/disks | Composite Disk Read Bytes/sec |
356-
| Microsoft.Compute/disks | Composite Disk Read Operations/sec |
357-
| Microsoft.Compute/disks | Composite Disk Write Bytes/sec |
358-
| Microsoft.Compute/disks | Composite Disk Write Operations/sec |
359-
| Microsoft.ContainerService/managedClusters | NodesCount |
360-
| Microsoft.ContainerService/managedClusters | PodCount |
361-
| Microsoft.ContainerService/managedClusters | CompletedJobsCount |
362-
| Microsoft.ContainerService/managedClusters | RestartingContainerCount |
363-
| Microsoft.ContainerService/managedClusters | OomKilledContainerCount |
364-
| Microsoft.Devices/IotHubs | TotalDeviceCount |
365-
| Microsoft.Devices/IotHubs | ConnectedDeviceCount |
366-
| Microsoft.Devices/IotHubs | TotalDeviceCount |
367-
| Microsoft.Devices/IotHubs | ConnectedDeviceCount |
368-
| Microsoft.DocumentDB/databaseAccounts | CassandraConnectionClosures |
369-
| Microsoft.EventHub/clusters | Size |
370-
| Microsoft.EventHub/namespaces | Size |
371-
| Microsoft.IoTCentral/IoTApps | connectedDeviceCount |
372-
| Microsoft.IoTCentral/IoTApps | provisionedDeviceCount |
373-
| Microsoft.Kubernetes/connectedClusters | NodesCount |
374-
| Microsoft.Kubernetes/connectedClusters | PodCount |
375-
| Microsoft.Kubernetes/connectedClusters | CompletedJobsCount |
376-
| Microsoft.Kubernetes/connectedClusters | RestartingContainerCount |
377-
| Microsoft.Kubernetes/connectedClusters | OomKilledContainerCount |
378-
| Microsoft.MachineLearningServices/workspaces/onlineEndpoints | RequestsPerMinute |
379-
| Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments | DeploymentCapacity |
380-
| Microsoft.Maps/accounts | CreatorUsage |
381-
| Microsoft.Media/mediaservices/streamingEndpoints | EgressBandwidth |
382-
| Microsoft.Network/applicationGateways | Throughput |
383-
| Microsoft.Network/azureFirewalls | Throughput |
384-
| Microsoft.Network/expressRouteGateways | ExpressRouteGatewayPacketsPerSecond |
385-
| Microsoft.Network/expressRouteGateways | ExpressRouteGatewayNumberOfVmInVnet |
386-
| Microsoft.Network/expressRouteGateways | ExpressRouteGatewayFrequencyOfRoutesChanged |
387-
| Microsoft.Network/virtualNetworkGateways | ExpressRouteGatewayBitsPerSecond |
388-
| Microsoft.Network/virtualNetworkGateways | ExpressRouteGatewayPacketsPerSecond |
389-
| Microsoft.Network/virtualNetworkGateways | ExpressRouteGatewayNumberOfVmInVnet |
390-
| Microsoft.Network/virtualNetworkGateways | ExpressRouteGatewayFrequencyOfRoutesChanged |
391-
| Microsoft.ServiceBus/namespaces | Size |
392-
| Microsoft.ServiceBus/namespaces | Messages |
393-
| Microsoft.ServiceBus/namespaces | ActiveMessages |
394-
| Microsoft.ServiceBus/namespaces | DeadletteredMessages |
395-
| Microsoft.ServiceBus/namespaces | ScheduledMessages |
396-
| Microsoft.ServiceFabricMesh/applications | AllocatedCpu |
397-
| Microsoft.ServiceFabricMesh/applications | AllocatedMemory |
398-
| Microsoft.ServiceFabricMesh/applications | ActualCpu |
399-
| Microsoft.ServiceFabricMesh/applications | ActualMemory |
400-
| Microsoft.ServiceFabricMesh/applications | ApplicationStatus |
401-
| Microsoft.ServiceFabricMesh/applications | ServiceStatus |
402-
| Microsoft.ServiceFabricMesh/applications | ServiceReplicaStatus |
403-
| Microsoft.ServiceFabricMesh/applications | ContainerStatus |
404-
| Microsoft.ServiceFabricMesh/applications | RestartCount |
405-
| Microsoft.Storage/storageAccounts | UsedCapacity |
406-
| Microsoft.Storage/storageAccounts/blobServices | BlobCapacity |
407-
| Microsoft.Storage/storageAccounts/blobServices | BlobCount |
408-
| Microsoft.Storage/storageAccounts/blobServices | BlobProvisionedSize |
409-
| Microsoft.Storage/storageAccounts/blobServices | IndexCapacity |
410-
| Microsoft.Storage/storageAccounts/fileServices | FileCapacity |
411-
| Microsoft.Storage/storageAccounts/fileServices | FileCount |
412-
| Microsoft.Storage/storageAccounts/fileServices | FileShareCount |
413-
| Microsoft.Storage/storageAccounts/fileServices | FileShareSnapshotCount |
414-
| Microsoft.Storage/storageAccounts/fileServices | FileShareSnapshotSize |
415-
| Microsoft.Storage/storageAccounts/fileServices | FileShareCapacityQuota |
416-
| Microsoft.Storage/storageAccounts/fileServices | FileShareProvisionedIOPS |
417302

418303
## Next steps
419304

0 commit comments

Comments
 (0)