You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/azure-monitor/alerts/proactive-failure-diagnostics.md
+14-14Lines changed: 14 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,9 +9,9 @@ ms.reviewer: yalavi
9
9
# Smart Detection - Failure Anomalies
10
10
[Application Insights](../app/app-insights-overview.md) automatically alerts you in near real time if your web app experiences an abnormal rise in the rate of failed requests. It detects an unusual rise in the rate of HTTP requests or dependency calls that are reported as failed. For requests, failed requests usually have response codes of 400 or higher. To help you triage and diagnose the problem, an analysis of the characteristics of the failures and related application data is provided in the alert details. There are also links to the Application Insights portal for further diagnosis. The feature needs no set-up nor configuration, as it uses machine learning algorithms to predict the normal failure rate.
11
11
12
-
This feature works for any web app, hosted in the cloud or on your own servers, that generate application request or dependency data. For example, if you have a worker role that calls [TrackRequest()](../app/api-custom-events-metrics.md#trackrequest) or [TrackDependency()](../app/api-custom-events-metrics.md#trackdependency).
12
+
This feature works for any web app, hosted in the cloud or on your own servers that generate application request or dependency data. For example, if you have a worker role that calls [TrackRequest()](../app/api-custom-events-metrics.md#trackrequest) or [TrackDependency()](../app/api-custom-events-metrics.md#trackdependency).
13
13
14
-
After setting up [Application Insights for your project](../app/app-insights-overview.md), and if your app generates a certain minimum amount of data, Smart Detection of Failure Anomalies takes 24 hours to learn the normal behavior of your app, before it is switched on and can send alerts.
14
+
After setting up [Application Insights for your project](../app/app-insights-overview.md), and if your app generates a certain minimum amount of data, Smart Detection of Failure Anomalies takes 24 hours to learn the normal behavior of your app, before it's switched on and can send alerts.
15
15
16
16
Here's a sample alert:
17
17
@@ -39,22 +39,22 @@ When an analysis is triggered, the service performs a cluster analysis on the fa
39
39
40
40
In the example shown before, the analysis discovered that most failures are about a specific result code, request name, Server URL host, and role instance.
41
41
42
-
When you instrument your service with these calls, the analyzer looks for an exception and a dependency failure that are associated with requests in the identified cluster. It also looks for an example of any trace logs associated with those requests. The alert you receive includes this additional information that can provide context to the detection and hint on the root cause for the detected problem.
42
+
When you instrument your service with these calls, the analyzer looks for an exception and a dependency failure that are associated with requests in the identified cluster. It also looks for an example of any trace logs, associated with those requests. The alert you receive includes this additional information that can provide context to the detection and hint on the root cause for the detected problem.
43
43
44
44
### Alert logic details
45
-
Failure Anomalies detection relies on a proprietary machine learning algorithm, so the reasons for an alert firing or not firing are not always deterministic. With that said, the primary factors that the algorithm uses are:
45
+
Failure Anomalies detection relies on a proprietary machine learning algorithm, so the reasons for an alert firing or not firing aren't always deterministic. With that said, the primary factors that the algorithm uses are:
46
46
47
47
* Analysis of the failure percentage of requests/dependencies in a rolling time window of 20 minutes.
48
48
* A comparison of the failure percentage in the last 20 minutes, to the rate in the last 40 minutes and the past seven days. The algorithm is looking for significant deviations that exceed X-times of the standard deviation.
49
-
*Using an adaptive limit for the minimum failure percentage, which varies based on the app’s volume of requests/dependencies.
50
-
*There is logic that can automatically resolve the fired alert, if the issue is no longer detected for 8-24 hours.
51
-
Note: in the current design. a notification or action is not sent when a Smart Detection alert is resolved. You can check if a Smart Detection alert was resolved in the Azure portal.
49
+
*The algorithm is using an adaptive limit for the minimum failure percentage, which varies based on the app’s volume of requests/dependencies.
50
+
*The algorithm includes logic that can automatically resolve the fired alert, if the issue is no longer detected for 8-24 hours.
51
+
Note: in the current design. a notification or action isn't sent when a Smart Detection alert is resolved. You can check if a Smart Detection alert was resolved in the Azure portal.
52
52
53
53
## Managing Failure Anomalies alert rules
54
54
55
55
### Alert rule creation
56
56
A Failure Anomalies alert rule is created automatically when your Application Insights resource is created. The rule is automatically configured to analyze the telemetry on that resource.
57
-
You can create the rule again using Azure [REST API](https://learn.microsoft.com/rest/api/monitor/smart-detector-alert-rules?view=rest-monitor-2019-06-01) or using a [Resource Manager template](./proactive-arm-config#failure-anomalies-alert-rule). This is useful if the automatic creation of the rule failed for some reason, or if you deleted the rule by chance or on purpose,
57
+
You can create the rule again using Azure [REST API](https://learn.microsoft.com/rest/api/monitor/smart-detector-alert-rules?view=rest-monitor-2019-06-01) or using a [Resource Manager template](./proactive-arm-config#failure-anomalies-alert-rule). Creating the rule can be useful if the automatic creation of the rule failed for some reason, or if you deleted the rule by chance or on purpose,
58
58
59
59
### Alert rule configuration
60
60
To configure a Failure Anomalies alert rule in the portal, open the Alerts page and select Alert Rules. Failure Anomalies alert rules are included along with any alerts that you set manually.
@@ -67,7 +67,7 @@ Click the alert rule to configure it.
67
67
68
68
You can disable Smart Detection alert rule from the portal or using Azure Resource Manager template ([see template example](./proactive-arm-config#failure-anomalies-alert-rule).
69
69
70
-
This alert rule is created with an associated [Action Group](./action-groups.md) named "Application Insights Smart Detection". By default, this action group contains Email Azure Resource Manager Role actions and sends notification to users who have Monitoring Contributor or Monitoring Reader subscription Azure Resource Manager roles in your subscription. You can remove, change or add the action groups that the rule triggers, as for any other Azure alert rule. Notifications sent from this alert rule follow the [common alert schema](./alerts-common-schema.md).
70
+
This alert rule is created with an associated [Action Group](./action-groups.md) named "Application Insights Smart Detection." By default, this action group contains Email Azure Resource Manager Role actions and sends notification to users who have Monitoring Contributor or Monitoring Reader subscription Azure Resource Manager roles in your subscription. You can remove, change or add the action groups that the rule triggers, as for any other Azure alert rule. Notifications sent from this alert rule follow the [common alert schema](./alerts-common-schema.md).
71
71
72
72
73
73
## Delete alerts
@@ -383,7 +383,7 @@ Notice that if you delete an Application Insights resource, the associated Failu
383
383
384
384
## Triage and diagnose an alert
385
385
386
-
An alert indicates that an abnormal rise in the failed request rate was detected. It's likely that there is some problem with your app or its environment.
386
+
An alert indicates that an abnormal rise in the failed request rate was detected. It's likely that there's some problem with your app or its environment.
387
387
388
388
To investigate further, click on 'View full details in Application Insights'. The links in this page take you straight to a [search page](../app/diagnostic-search.md) filtered to the relevant requests, exception, dependency, or traces.
389
389
@@ -393,7 +393,7 @@ Clicking on 'Diagnose failures' can help you get more details and resolve the is
From the percentage of requests and number of users affected, you can decide how urgent the issue is. In the example shown before, the failure rate of 78.5% compares with a normal rate of 2.2%, indicates that something bad is going on. On the other hand, only 46 users were affected. This can help you to assess how serious the problem is.
396
+
From the percentage of requests and number of users affected, you can decide how urgent the issue is. In the example shown before, the failure rate of 78.5% compares with a normal rate of 2.2%, indicates that something bad is going on. On the other hand, only 46 users were affected. This information can help you to assess how serious the problem is.
397
397
398
398
In many cases, you can diagnose the problem quickly from the request name, exception, dependency failure, and trace data provided.
399
399
@@ -410,7 +410,7 @@ Click **Alerts** in the Application Insights resource page to get to the most re
410
410
## If you receive a Smart Detection alert
411
411
*Why I received this alert?*
412
412
413
-
* We detected an abnormal rise in failed requests rate compared to the normal baseline of the preceding period. After analysis of the failures and associated application data, we think that there is a problem that you should look into.
413
+
* We detected an abnormal rise in failed requests rate compared to the normal baseline of the preceding period. After analysis of the failures and associated application data, we think that there's a problem that you should look into.
414
414
415
415
*Does the notification mean I definitely have a problem?*
416
416
@@ -430,9 +430,9 @@ Click **Alerts** in the Application Insights resource page to get to the most re
430
430
431
431
*I lost the email. Where can I find the notifications in the portal?*
432
432
433
-
*In the Activity logs. In Azure, open the Application Insights resource for your app, then select Activity logs.
433
+
*You can find Failure Anonaloes alert in the Azure portal, in your Application Insights alerts page.
434
434
435
-
*Some of the alerts are about known issues and I do not want to receive them.*
435
+
*Some of the alerts are about known issues and I don't want to receive them.*
436
436
437
437
* You can use [alert action rules](./alerts-processing-rules.md) suppression feature.
0 commit comments