|
| 1 | +--- |
| 2 | +title: 'Azure Monitor best practices: Alerts and automated actions' |
| 3 | +description: Recommendations for deployment of Azure Monitor alerts and automated actions. |
| 4 | +ms.topic: conceptual |
| 5 | +author: bwren |
| 6 | +ms.author: bwren |
| 7 | +ms.date: 05/31/2023 |
| 8 | +ms.reviewer: bwren |
| 9 | + |
| 10 | +--- |
| 11 | + |
| 12 | +# Deploy Azure Monitor: Alerts and automated actions |
| 13 | + |
| 14 | +This article provides guidance on alerts in Azure Monitor. Alerts proactively notify you of important data or patterns identified in your monitoring data. You can view alerts in the Azure portal. You can create alerts that: |
| 15 | + |
| 16 | +- Send a proactive notification. |
| 17 | +- Initiate an automated action to attempt to remediate an issue. |
| 18 | + |
| 19 | +## Alerting strategy |
| 20 | + |
| 21 | +An alerting strategy defines your organization's standards for: |
| 22 | + |
| 23 | +- The types of alert rules that you'll create for different scenarios. |
| 24 | +- How you'll categorize and manage alerts after they're created. |
| 25 | +- Automated actions and notifications that you'll take in response to alerts. |
| 26 | + |
| 27 | +Defining an alert strategy assists you in defining the configuration of alert rules including alert severity and action groups. |
| 28 | + |
| 29 | +For factors to consider as you develop an alerting strategy, see [Successful alerting strategy](/azure/cloud-adoption-framework/manage/monitor/alerting#successful-alerting-strategy). |
| 30 | + |
| 31 | +## Alert rule types |
| 32 | + |
| 33 | +Alerts in Azure Monitor are created by alert rules that you must create. For guidance on recommended alert rules, see the monitoring documentation for each Azure service. Azure Monitor doesn't have any alert rules by default. |
| 34 | + |
| 35 | +Multiple types of alert rules are defined by the type of data they use. Each has different capabilities and a different cost. The basic strategy is to use the alert rule type with the lowest cost that provides the logic you require. |
| 36 | + |
| 37 | +- Activity log rules. Creates an alert in response to a new activity log event that matches specified conditions. There's no cost to these alerts so they should be your first choice, although the conditions they can detect are limited. See [Create or edit an alert rule](alerts-create-new-alert-rule.md) for information on creating an activity log alert. |
| 38 | +- Metric alert rules. Creates an alert in response to one or more metric values exceeding a threshold. Metric alerts are stateful, which means that the alert will automatically close when the value drops below the threshold, and it will only send out notifications when the state changes. There's a cost to metric alerts, but it's often much less than log alerts. See [Create or edit an alert rule](alerts-create-new-alert-rule.md) for information on creating a metric alert. |
| 39 | +- Log alert rules. Creates an alert when the results of a schedule query match specified criteria. They're the most expensive of the alert rules, but they allow the most complex criteria. See [Create or edit an alert rule](alerts-create-new-alert-rule.md) for information on creating a log query alert. |
| 40 | +- [Application alerts](/previous-versions/azure/azure-monitor/app/monitor-web-app-availability). Performs proactive performance and availability testing of your web application. You can perform a ping test at no cost, but there's a cost to more complex testing. See [Monitor the availability of any website](/previous-versions/azure/azure-monitor/app/monitor-web-app-availability) for a description of the different tests and information on creating them. |
| 41 | + |
| 42 | +## Alert severity |
| 43 | + |
| 44 | +Each alert rule defines the severity of the alerts that it creates based on the following table. Alerts in the Azure portal are grouped by level so that you can manage similar alerts together and quickly identify alerts that require the greatest urgency. |
| 45 | + |
| 46 | +| Level | Name | Description | |
| 47 | +|:---|:---|:---| |
| 48 | +| Sev 0 | Critical | Loss of service or application availability or severe degradation of performance. Requires immediate attention. | |
| 49 | +| Sev 1 | Error | Degradation of performance or loss of availability of some aspect of an application or service. Requires attention but not immediate. | |
| 50 | +| Sev 2 | Warning | A problem that doesn't include any current loss in availability or performance, although it has the potential to lead to more severe problems if unaddressed. | |
| 51 | +| Sev 3 | Informational | Doesn't indicate a problem but provides interesting information to an operator, such as successful completion of a regular process. | |
| 52 | +| Sev 4 | Verbose | Doesn't indicate a problem but provides detailed information that is verbose. |
| 53 | + |
| 54 | +Assess the severity of the condition each rule is identifying to assign an appropriate level. Define the types of issues you assign to each severity level and your standard response to each in your alerts strategy. |
| 55 | + |
| 56 | +## Action groups |
| 57 | + |
| 58 | +Automated responses to alerts in Azure Monitor are defined in [action groups](action-groups.md). An action group is a collection of one or more notifications and actions that are fired when an alert is triggered. A single action group can be used with multiple alert rules and contain one or more of the following items: |
| 59 | + |
| 60 | +- **Notifications**: Messages that notify operators and administrators that an alert was created. |
| 61 | +- **Actions**: Automated processes that attempt to correct the detected issue. |
| 62 | + |
| 63 | +## Notifications |
| 64 | + |
| 65 | +Notifications are messages sent to one or more users to notify them that an alert has been created. Because a single action group can be used with multiple alert rules, you should design a set of action groups for different sets of administrators and users who will receive the same sets of alerts. Use any of the following types of notifications depending on the preferences of your operators and your organizational standards: |
| 66 | + |
| 67 | +- Email |
| 68 | +- SMS |
| 69 | +- Push to Azure app |
| 70 | +- Voice |
| 71 | +- Email Azure Resource Manager role |
| 72 | + |
| 73 | +## Actions |
| 74 | + |
| 75 | +Actions are automated responses to an alert. You can use the available actions for any scenario that they support, but the following sections describe how each action is typically used. |
| 76 | + |
| 77 | +### Automated remediation |
| 78 | + |
| 79 | +Use the following actions to attempt automated remediation of the issue identified by the alert: |
| 80 | + |
| 81 | +- **Automation runbook**: Start a built-in runbook or a custom runbook in Azure Automation. For example, built-in runbooks are available to perform such functions as restarting or scaling up a virtual machine. |
| 82 | +- **Azure Functions**: Start an Azure function. |
| 83 | + |
| 84 | +### ITSM and on-call management |
| 85 | + |
| 86 | +- **IT service management (ITSM)**: Use the ITSM Connector to create work items in your ITSM tool based on alerts from Azure Monitor. You first configure the connector and then use the **ITSM** action in alert rules. |
| 87 | +- **Webhooks**: Send the alert to an incident management system that supports webhooks such as PagerDuty and Splunk On-Call. |
| 88 | +- **Secure webhook**: Integrate ITSM with Azure Active Directory Authentication. |
| 89 | + |
| 90 | +## Minimize alert activity |
| 91 | + |
| 92 | +You want to create alerts for any important information in your environment. But you don't want to create excessive alerts and notifications for issues that don't warrant them. To minimize your alert activity to ensure that critical issues are surfaced while you don't generate excess information and notifications for administrators, follow these guidelines: |
| 93 | + |
| 94 | +- See [Successful alerting strategy](/azure/cloud-adoption-framework/manage/monitor/alerting#successful-alerting-strategy) to determine whether a symptom is an appropriate candidate for alerting. |
| 95 | +- Use the **Automatically resolve alerts** option in metric alert rules to resolve alerts when the condition has been corrected. |
| 96 | +- Use the **Suppress alerts** option in log query alert rules to avoid creating multiple alerts for the same issue. |
| 97 | +- Ensure that you use appropriate severity levels for alert rules so that high-priority issues can be analyzed together. |
| 98 | +- Limit notifications for alerts with a severity of Warning or less because they don't require immediate attention. |
| 99 | + |
| 100 | +## Create alert rules at scale |
| 101 | + |
| 102 | +Typically, you'll want to alert on issues for all your critical Azure applications and resources. Use the following methods for creating alert rules at scale: |
| 103 | + |
| 104 | +- Azure Monitor supports monitoring multiple resources of the same type with one metric alert rule for resources that exist in the same Azure region. For a list of Azure services that are currently supported for this feature, see [Monitoring at scale using metric alerts in Azure Monitor](alerts-metric-overview.md#monitoring-at-scale-using-metric-alerts-in-azure-monitor). |
| 105 | +- For metric alert rules for Azure services that don't support multiple resources, use automation tools such as the Azure CLI and PowerShell with Resource Manager templates to create the same alert rule for multiple resources. For samples, see [Resource Manager template samples for metric alert rules in Azure Monitor](resource-manager-alerts-metric.md). |
| 106 | +- To return data for multiple resources, write queries in log query alert rules. Use the **Split by dimensions** setting in the rule to create separate alerts for each resource. |
| 107 | + |
| 108 | +> [!NOTE] |
| 109 | +> Resource-centric log query alert rules currently in public preview allow you to use all resources in a subscription or resource group as a target for a log query alert. |
| 110 | +
|
| 111 | +## Next steps |
| 112 | + |
| 113 | +[Optimize cost in Azure Monitor](../best-practices-cost.md). |
0 commit comments