Skip to content

Commit a528741

Browse files
Merge pull request #260119 from AbbyMSFT/yair
Yair's merge conflict
2 parents 57f42b9 + 6d8abe2 commit a528741

File tree

1 file changed

+31
-40
lines changed

1 file changed

+31
-40
lines changed

articles/azure-monitor/alerts/proactive-failure-diagnostics.md

Lines changed: 31 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.reviewer: yalavi
99
# Smart Detection - Failure Anomalies
1010
[Application Insights](../app/app-insights-overview.md) automatically alerts you in near real time if your web app experiences an abnormal rise in the rate of failed requests. It detects an unusual rise in the rate of HTTP requests or dependency calls that are reported as failed. For requests, failed requests usually have response codes of 400 or higher. To help you triage and diagnose the problem, an analysis of the characteristics of the failures and related application data is provided in the alert details. There are also links to the Application Insights portal for further diagnosis. The feature needs no set-up nor configuration, as it uses machine learning algorithms to predict the normal failure rate.
1111

12-
This feature works for any web app, hosted in the cloud or on your own servers, that generate application request or dependency data. For example, if you have a worker role that calls [TrackRequest()](../app/api-custom-events-metrics.md#trackrequest) or [TrackDependency()](../app/api-custom-events-metrics.md#trackdependency).
12+
This feature works for any web app, hosted in the cloud or on your own servers that generate application request or dependency data. For example, if you have a worker role that calls [TrackRequest()](../app/api-custom-events-metrics.md#trackrequest) or [TrackDependency()](../app/api-custom-events-metrics.md#trackdependency).
1313

1414
After setting up [Application Insights for your project](../app/app-insights-overview.md), and if your app generates a certain minimum amount of data, Smart Detection of Failure Anomalies takes 24 hours to learn the normal behavior of your app, before it's switched on and can send alerts.
1515

@@ -29,54 +29,50 @@ The alert details tell you:
2929
Ordinary [metric alerts](./alerts-log.md) tell you there might be a problem. But Smart Detection starts the diagnostic work for you, performing much the analysis you would otherwise have to do yourself. You get the results neatly packaged, helping you to get quickly to the root of the problem.
3030

3131
## How it works
32-
Smart Detection monitors the data received from your app, and in particular the failure rates. This rule counts the number of requests for which the `Successful request` property is false, and the number of dependency calls for which the `Successful call` property is false. For requests, by default, `Successful request == (resultCode < 400)` (unless you have written custom code to [filter](../app/api-filtering-sampling.md#filtering) or generate your own [TrackRequest](../app/api-custom-events-metrics.md#trackrequest) calls).
32+
Smart Detection monitors the data received from your app, and in particular the failure rates. This rule counts the number of requests for which the `Successful request` property is false, and the number of dependency calls for which the `Successful call` property is false. For requests, by default, `Successful request == (resultCode < 400)` (unless you write custom code to [filter](../app/api-filtering-sampling.md#filtering) or generate your own [TrackRequest](../app/api-custom-events-metrics.md#trackrequest) calls).
3333

3434
Your app's performance has a typical pattern of behavior. Some requests or dependency calls are more prone to failure than others; and the overall failure rate may go up as load increases. Smart Detection uses machine learning to find these anomalies.
3535

36-
As data comes into Application Insights from your web app, Smart Detection compares the current behavior with the patterns seen over the past few days. If an abnormal rise in failure rate is observed by comparison with previous performance, an analysis is triggered.
36+
As data comes into Application Insights from your web app, Smart Detection compares the current behavior with the patterns seen over the past few days. If the detector discovers an abnormal rise in failure rate comparison with previous performance, the detector triggers a more in-depth analysis.
3737

3838
When an analysis is triggered, the service performs a cluster analysis on the failed request, to try to identify a pattern of values that characterize the failures.
3939

40-
In the previous example, the analysis has discovered that most failures are about a specific result code, request name, Server URL host, and role instance.
40+
In the example shown before, the analysis discovered that most failures are about a specific result code, request name, Server URL host, and role instance.
4141

42-
When your service is instrumented with these calls, the analyzer looks for an exception and a dependency failure that are associated with requests in the cluster it has identified, together with an example of any trace logs associated with those requests.
43-
44-
The resulting analysis is sent to you as alert, unless you have configured it not to.
45-
46-
Like the [alerts you set manually](./alerts-log.md), you can inspect the state of the fired alert, which can be resolved if the issue is fixed. Configure the alert rules in the Alerts page of your Application Insights resource. But unlike other alerts, you don't need to set up or configure Smart Detection. If you want, you can disable it or change its target email addresses.
42+
When you instrument your service with these calls, the analyzer looks for an exception and a dependency failure that are associated with requests in the identified cluster. It also looks for an example of any trace logs, associated with those requests. The alert you receive includes this additional information that can provide context to the detection and hint on the root cause for the detected problem.
4743

4844
### Alert logic details
49-
50-
Our proprietary machine learning algorithm triggers the alerts, so we can't share the exact implementation details. With that said, we understand that you sometimes need to know more about how the underlying logic works. The primary factors that are evaluated to determine if an alert should be triggered are:
45+
Failure Anomalies detection relies on a proprietary machine learning algorithm, so the reasons for an alert firing or not firing aren't always deterministic. With that said, the primary factors that the algorithm uses are:
5146

5247
* Analysis of the failure percentage of requests/dependencies in a rolling time window of 20 minutes.
53-
* A comparison of the failure percentage of the last 20 minutes to the rate in the last 40 minutes and the past seven days, and looking for significant deviations that exceed X-times that standard deviation.
54-
* Using an adaptive limit for the minimum failure percentage, which varies based on the app’s volume of requests/dependencies.
55-
* There's logic that can automatically resolve the fired alert monitor condition, if the issue is no longer detected for 8-24 hours.
56-
Note: in the current design. a notification or action will not be sent when a Smart Detection alert is resolved. You can check if a Smart Detection alert was resolved in the Azure portal.
48+
* A comparison of the failure percentage in the last 20 minutes, to the rate in the last 40 minutes and the past seven days. The algorithm is looking for significant deviations that exceed X-times of the standard deviation.
49+
* The algorithm is using an adaptive limit for the minimum failure percentage, which varies based on the app’s volume of requests/dependencies.
50+
* The algorithm includes logic that can automatically resolve the fired alert, if the issue is no longer detected for 8-24 hours.
51+
Note: in the current design. a notification or action isn't sent when a Smart Detection alert is resolved. You can check if a Smart Detection alert was resolved in the Azure portal.
5752

58-
## Configure alerts
53+
## Managing Failure Anomalies alert rules
5954

60-
You can disable Smart Detection alert rule from the portal or using Azure Resource Manager ([see template example](./proactive-arm-config.md)).
55+
### Alert rule creation
56+
A Failure Anomalies alert rule is created automatically when your Application Insights resource is created. The rule is automatically configured to analyze the telemetry on that resource.
57+
You can create the rule again using Azure [REST API](https://learn.microsoft.com/rest/api/monitor/smart-detector-alert-rules?view=rest-monitor-2019-06-01&preserve-view=true) or using a [Resource Manager template](proactive-arm-config.md#failure-anomalies-alert-rule). Creating the rule can be useful if the automatic creation of the rule failed for some reason, or if you deleted the rule.
6158

62-
This alert rule is created with an associated [Action Group](./action-groups.md) named "Application Insights Smart Detection" that contains email and webhook actions, and can be extended to trigger more actions when the alert fires.
63-
64-
> [!NOTE]
65-
> Email notifications sent from this alert rule are now sent by default to users associated with the subscription's Monitoring Reader and Monitoring Contributor roles. More information on this is available [here](./proactive-email-notification.md).
66-
> Notifications sent from this alert rule follow the [common alert schema](./alerts-common-schema.md).
67-
>
68-
69-
Open the Alerts page. Failure Anomalies alert rules are included along with any alerts that you have set manually, and you can see whether it's currently in the alert state.
59+
### Alert rule configuration
60+
To configure a Failure Anomalies alert rule in the portal, open the Alerts page and select Alert Rules. Failure Anomalies alert rules are included along with any alerts that you set manually.
7061

7162
:::image type="content" source="./media/proactive-failure-diagnostics/021.png" alt-text="On the Application Insights resource page, click Alerts tile, then Manage alert rules." lightbox="./media/proactive-failure-diagnostics/021.png":::
7263

73-
Click the alert to configure it.
64+
Click the alert rule to configure it.
7465

7566
:::image type="content" source="./media/proactive-failure-diagnostics/032.png" alt-text="Rule configuration screen." lightbox="./media/proactive-failure-diagnostics/032.png":::
7667

68+
You can disable Smart Detection alert rule from the portal or using an [Azure Resource Manager template](proactive-arm-config.md#failure-anomalies-alert-rule).
69+
70+
This alert rule is created with an associated [Action Group](./action-groups.md) named "Application Insights Smart Detection." By default, this action group contains Email Azure Resource Manager Role actions and sends notification to users who have Monitoring Contributor or Monitoring Reader subscription Azure Resource Manager roles in your subscription. You can remove, change or add the action groups that the rule triggers, as for any other Azure alert rule. Notifications sent from this alert rule follow the [common alert schema](./alerts-common-schema.md).
71+
72+
7773
## Delete alerts
7874

79-
You can disable or delete a Failure Anomalies alert rule.
75+
You can delete a Failure Anomalies alert rule.
8076

8177
You can do so manually on the Alert rules page or with the following Azure CLI command:
8278

@@ -387,19 +383,21 @@ Notice that if you delete an Application Insights resource, the associated Failu
387383

388384
## Triage and diagnose an alert
389385

386+
An alert indicates that an abnormal rise in the failed request rate was detected. It's likely that there's some problem with your app or its environment.
390387
An alert indicates that an abnormal rise in the failed request rate was detected. It's likely that there's some problem with your app or its environment.
391388

389+
To investigate further, click on 'View full details in Application Insights.' The links in this page take you straight to a [search page](../app/diagnostic-search.md) filtered to the relevant requests, exception, dependency, or traces.
392390
To investigate further, click on 'View full details in Application Insights' the links in this page take you straight to a [search page](../app/transaction-search-and-diagnostics.md?tabs=transaction-search) filtered to the relevant requests, exception, dependency, or traces.
393391

394392
You can also open the [Azure portal](https://portal.azure.com), navigate to the Application Insights resource for your app, and open the Failures page.
395393

396-
Clicking on 'Diagnose failures' helps you get more details and resolve the issue.
394+
Clicking on 'Diagnose failures' can help you get more details and resolve the issue.
397395

398396
:::image type="content" source="./media/proactive-failure-diagnostics/051.png" alt-text="Diagnostic search." lightbox="./media/proactive-failure-diagnostics/051.png#lightbox":::
399397

400-
From the percentage of requests and number of users affected, you can decide how urgent the issue is. In the previous example, the failure rate of 78.5% compares with a normal rate of 2.2%, indicates that something bad is going on. On the other hand, only 46 users were affected. If it was your app, you'd be able to assess how serious that is.
398+
From the percentage of requests and number of users affected, you can decide how urgent the issue is. In the example shown before, the failure rate of 78.5% compares with a normal rate of 2.2%, indicates that something bad is going on. On the other hand, only 46 users were affected. This information can help you to assess how serious the problem is.
401399

402-
In many cases, you will be able to diagnose the problem quickly from the request name, exception, dependency failure, and trace data provided.
400+
In many cases, you can diagnose the problem quickly from the request name, exception, dependency failure, and trace data provided.
403401

404402
In this example, there was an exception from SQL Database due to request limit being reached.
405403

@@ -411,15 +409,8 @@ Click **Alerts** in the Application Insights resource page to get to the most re
411409

412410
:::image type="content" source="./media/proactive-failure-diagnostics/070.png" alt-text="Alerts summary." lightbox="./media/proactive-failure-diagnostics/070.png":::
413411

414-
## What's the difference ...
415-
Smart Detection of Failure Anomalies complements other similar but distinct features of Application Insights.
416-
417-
* [metric alerts](./alerts-log.md) are set by you and can monitor a wide range of metrics such as CPU occupancy, request rates, page load times, and so on. You can use them to warn you, for example, if you need to add more resources. By contrast, Smart Detection of Failure Anomalies covers a small range of critical metrics (currently only failed request rate), designed to notify you in near real-time manner once your web app's failed request rate increases compared to web app's normal behavior. Unlike metric alerts, Smart Detection automatically sets and updates thresholds in response changes in the behavior. Smart Detection also starts the diagnostic work for you, saving you time in resolving issues.
418-
419-
* [Smart Detection of performance anomalies](smart-detection-performance.md) also uses machine intelligence to discover unusual patterns in your metrics, and no configuration by you is required. But unlike Smart Detection of Failure Anomalies, the purpose of Smart Detection of performance anomalies is to find segments of your usage manifold that might be badly served - for example, by specific pages on a specific type of browser. The analysis is performed daily, and if any result is found, it's likely to be much less urgent than an alert. By contrast, the analysis for Failure Anomalies is performed continuously on incoming application data, and you will be notified within minutes if server failure rates are greater than expected.
420-
421412
## If you receive a Smart Detection alert
422-
*Why have I received this alert?*
413+
*Why I received this alert?*
423414

424415
* We detected an abnormal rise in failed requests rate compared to the normal baseline of the preceding period. After analysis of the failures and associated application data, we think that there's a problem that you should look into.
425416

@@ -441,9 +432,9 @@ Smart Detection of Failure Anomalies complements other similar but distinct feat
441432

442433
*I lost the email. Where can I find the notifications in the portal?*
443434

444-
* In the Activity logs. In Azure, open the Application Insights resource for your app, then select Activity logs.
435+
* You can find Failure Anomalies alerts in the Azure portal, in your Application Insights alerts page.
445436

446-
*Some of the alerts are about known issues and I do not want to receive them.*
437+
*Some of the alerts are about known issues and I don't want to receive them.*
447438

448439
* You can use [alert action rules](./alerts-processing-rules.md) suppression feature.
449440

0 commit comments

Comments
 (0)