|
| 1 | +--- |
| 2 | +title: How to monitor cluster availability with Azure Monitor logs in HDInsight |
| 3 | +description: Learn how to use Azure Monitor logs to monitor cluster health and availability. |
| 4 | +author: hrasheed-msft |
| 5 | +ms.author: hrasheed |
| 6 | +ms.reviewer: jasonh |
| 7 | +ms.service: hdinsight |
| 8 | +ms.topic: conceptual |
| 9 | +ms.date: 05/01/2020 |
| 10 | +--- |
| 11 | + |
| 12 | +# How to monitor cluster availability with Azure Monitor logs in HDInsight |
| 13 | + |
| 14 | +HDInsight clusters include Azure Monitor logs integration, which provides queryable metrics and logs, as well as configurable alerts. This article shows how to use Azure Monitor to monitor your cluster. |
| 15 | + |
| 16 | +## Azure Monitor logs integration |
| 17 | + |
| 18 | +Azure Monitor logs enable data generated by multiple resources, such as HDInsight clusters, to be collected and aggregated in one place to achieve a unified monitoring experience. |
| 19 | + |
| 20 | +As a prerequisite, you'll need a Log Analytics Workspace to store the collected data. If you haven't already created one, you can follow instructions here: [Create a Log Analytics Workspace](https://docs.microsoft.com/azure/azure-monitor/learn/quick-create-workspace). |
| 21 | + |
| 22 | +## Enable HDInsight Azure Monitor logs integration |
| 23 | + |
| 24 | +From the HDInsight cluster resource page in the portal, select **Azure Monitor**. Then, select **enable** and select your Log Analytics workspace from the drop-down. |
| 25 | + |
| 26 | + |
| 27 | + |
| 28 | +## Query metrics and logs tables |
| 29 | + |
| 30 | +Once Azure Monitor log integration is enabled (this may take a few minutes), navigate to your **Log Analytics Workspace** resource and select **Logs**. |
| 31 | + |
| 32 | + |
| 33 | + |
| 34 | +Logs list a number of sample queries, such as: |
| 35 | + |
| 36 | +| Query Name | Description | |
| 37 | +|---------------------------------|---------------------------------------------------------------------------| |
| 38 | +| Computers availability today | Chart the number of computers sending logs, each hour | |
| 39 | +| List heartbeats | List all computer heartbeats from the last hour | |
| 40 | +| Last heartbeat of each computer | Show the last heartbeat sent by each computer | |
| 41 | +| Unavailable computers | List all known computers that didn't send a heartbeat in the last 5 hours | |
| 42 | +| Availability rate | Calculate the availability rate of each connected computer | |
| 43 | + |
| 44 | +As an example, run the **Availability rate** sample query by selecting **Run** on that query, as shown in the screenshot above. This will show the availability rate of each node in your cluster as a percentage. If you have enabled multiple HDInsight clusters to send metrics to the same Log Analytics workspace, you'll see the availability rate for all nodes in those clusters displayed. |
| 45 | + |
| 46 | + |
| 47 | + |
| 48 | +> [!NOTE] |
| 49 | +> Availability rate is measured over a 24-hour period, so your cluster will need to run for at least 24 hours before you see accurate availability rates. |
| 50 | +
|
| 51 | +You can pin this table to a shared dashboard by clicking **Pin** in the upper-right corner. If you don't have any writable shared dashboards, you can see how to create one here: [Create and share dashboards in the Azure portal](https://docs.microsoft.com/azure/azure-portal/azure-portal-dashboards#publish-and-share-a-dashboard). |
| 52 | + |
| 53 | +## Azure Monitor alerts |
| 54 | + |
| 55 | +You can also set up Azure Monitor alerts that will trigger when the value of a metric or the results of a query meet certain conditions. As an example, let's create an alert to send an email when one or more nodes hasn't sent a heartbeat in 5 hours (i.e. is presumed to be unavailable). |
| 56 | + |
| 57 | +From **Logs**, run the **Unavailable computers** sample query by selecting **Run** on that query, as shown below. |
| 58 | + |
| 59 | + |
| 60 | + |
| 61 | +If all nodes are available, this query should return zero results for now. Click **New alert rule** to begin configuring your alert for this query. |
| 62 | + |
| 63 | + |
| 64 | + |
| 65 | +There are three components to an alert: the *resource* for which to create the rule (the Log Analytics workspace in this case), the *condition* to trigger the alert, and the *action groups* that determine what will happen when the alert is triggered. |
| 66 | +Click the **condition title**, as shown below, to finish configuring the signal logic. |
| 67 | + |
| 68 | + |
| 69 | + |
| 70 | +This will open **Configure signal logic**. |
| 71 | + |
| 72 | +Set the **Alert logic** section as follows: |
| 73 | + |
| 74 | +*Based on: Number of results, Condition: Greater than, Threshold: 0.* |
| 75 | + |
| 76 | +Since this query only returns unavailable nodes as results, if the number of results is ever greater than 0, the alert should fire. |
| 77 | + |
| 78 | +In the **Evaluated based on** section, set the **period** and **frequency** based on how often you want to check for unavailable nodes. |
| 79 | + |
| 80 | +For the purpose of this alert, you want to make sure **Period=Frequency.** More information about period, frequency, and other alert parameters can be found [here](https://docs.microsoft.com/azure/azure-monitor/platform/alerts-unified-log#log-search-alert-rule---definition-and-types). |
| 81 | + |
| 82 | +Select **Done** when you're finished configuring the signal logic. |
| 83 | + |
| 84 | + |
| 85 | + |
| 86 | +If you don't already have an existing action group, click **Create New** under the **Action Groups** section. |
| 87 | + |
| 88 | + |
| 89 | + |
| 90 | +This will open **Add action group**. Choose an **Action group name**, **Short name**, **Subscription**, and **Resource group.** Under the **Actions** section, choose an **Action Name** and select **Email/SMS/Push/Voice** as the **Action Type.** |
| 91 | + |
| 92 | +> [!NOTE] |
| 93 | +> There are several other actions an alert can trigger besides an Email/SMS/Push/Voice, such as an Azure Function, LogicApp, Webhook, ITSM, and Automation Runbook. [Learn More.](https://docs.microsoft.com/azure/azure-monitor/platform/action-groups#action-specific-information) |
| 94 | +
|
| 95 | +This will open **Email/SMS/Push/Voice**. Choose a **Name** for the recipient, **check** the **Email** box, and type an email address to which you want the alert sent. Select **OK** in **Email/SMS/Push/Voice**, then in **Add action group** to finish configuring your action group. |
| 96 | + |
| 97 | + |
| 98 | + |
| 99 | +After these blades close, you should see your action group listed under the **Action Groups** section. Finally, complete the **Alert Details** section by typing an **Alert Rule Name** and **Description** and choosing a **Severity**. Click **Create Alert Rule** to finish. |
| 100 | + |
| 101 | + |
| 102 | + |
| 103 | +> [!TIP] |
| 104 | +> The ability to specify **Severity** is a powerful tool that can be used when creating multiple alerts. For example, you could create one alert to raise a Warning (Sev 1) if a single head node goes down and another alert that raises Critical (Sev 0) in the unlikely event that both head nodes go down. |
| 105 | +
|
| 106 | +When the condition for this alert is met, the alert will fire and you'll receive an email with the alert details like this: |
| 107 | + |
| 108 | + |
| 109 | + |
| 110 | +You can also view all alerts that have fired, grouped by severity, by going to **Alerts** in your **Log Analytics Workspace**. |
| 111 | + |
| 112 | + |
| 113 | + |
| 114 | +Selecting on a severity grouping (i.e. **Sev 1,** as highlighted above) will show records for all alerts of that severity that have fired like below: |
| 115 | + |
| 116 | + |
| 117 | + |
| 118 | +## Next steps |
| 119 | + |
| 120 | +* [Cluster availability - Apache Ambari](./hdinsight-cluster-availability.md) |
| 121 | +* [Use Azure Monitor logs](hdinsight-hadoop-oms-log-analytics-tutorial.md) |
0 commit comments