Skip to content

Commit 8f503b5

Browse files
Merge pull request #113605 from dagiro/freshness_c64
freshness_c64
2 parents 5e91fc8 + 1adda72 commit 8f503b5

16 files changed

+138
-116
lines changed

articles/hdinsight/TOC.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -221,8 +221,10 @@
221221
href: ./hdinsight-hadoop-oms-log-analytics-use-queries.md
222222
- name: Monitor cluster performance
223223
href: ./hdinsight-key-scenarios-to-monitor.md
224-
- name: Monitor cluster availability with Ambari and Azure Monitor logs
224+
- name: Cluster availability - Apache Ambari
225225
href: ./hdinsight-cluster-availability.md
226+
- name: Cluster availability - Azure Monitor logs
227+
href: ./cluster-availability-monitor-logs.md
226228
- name: Troubleshoot
227229
items:
228230
- name: Troubleshoot script actions
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
---
2+
title: How to monitor cluster availability with Azure Monitor logs in HDInsight
3+
description: Learn how to use Azure Monitor logs to monitor cluster health and availability.
4+
author: hrasheed-msft
5+
ms.author: hrasheed
6+
ms.reviewer: jasonh
7+
ms.service: hdinsight
8+
ms.topic: conceptual
9+
ms.date: 05/01/2020
10+
---
11+
12+
# How to monitor cluster availability with Azure Monitor logs in HDInsight
13+
14+
HDInsight clusters include Azure Monitor logs integration, which provides queryable metrics and logs, as well as configurable alerts. This article shows how to use Azure Monitor to monitor your cluster.
15+
16+
## Azure Monitor logs integration
17+
18+
Azure Monitor logs enable data generated by multiple resources, such as HDInsight clusters, to be collected and aggregated in one place to achieve a unified monitoring experience.
19+
20+
As a prerequisite, you'll need a Log Analytics Workspace to store the collected data. If you haven't already created one, you can follow instructions here: [Create a Log Analytics Workspace](https://docs.microsoft.com/azure/azure-monitor/learn/quick-create-workspace).
21+
22+
## Enable HDInsight Azure Monitor logs integration
23+
24+
From the HDInsight cluster resource page in the portal, select **Azure Monitor**. Then, select **enable** and select your Log Analytics workspace from the drop-down.
25+
26+
![HDInsight Operations Management Suite](media/cluster-availability-monitor-logs/azure-portal-monitoring.png)
27+
28+
## Query metrics and logs tables
29+
30+
Once Azure Monitor log integration is enabled (this may take a few minutes), navigate to your **Log Analytics Workspace** resource and select **Logs**.
31+
32+
![Log Analytics workspace logs](media/cluster-availability-monitor-logs/hdinsight-portal-logs.png)
33+
34+
Logs list a number of sample queries, such as:
35+
36+
| Query Name | Description |
37+
|---------------------------------|---------------------------------------------------------------------------|
38+
| Computers availability today | Chart the number of computers sending logs, each hour |
39+
| List heartbeats | List all computer heartbeats from the last hour |
40+
| Last heartbeat of each computer | Show the last heartbeat sent by each computer |
41+
| Unavailable computers | List all known computers that didn't send a heartbeat in the last 5 hours |
42+
| Availability rate | Calculate the availability rate of each connected computer |
43+
44+
As an example, run the **Availability rate** sample query by selecting **Run** on that query, as shown in the screenshot above. This will show the availability rate of each node in your cluster as a percentage. If you have enabled multiple HDInsight clusters to send metrics to the same Log Analytics workspace, you'll see the availability rate for all nodes in those clusters displayed.
45+
46+
![Log Analytics workspace logs 'availability rate' sample query](media/cluster-availability-monitor-logs/portal-availability-rate.png)
47+
48+
> [!NOTE]
49+
> Availability rate is measured over a 24-hour period, so your cluster will need to run for at least 24 hours before you see accurate availability rates.
50+
51+
You can pin this table to a shared dashboard by clicking **Pin** in the upper-right corner. If you don't have any writable shared dashboards, you can see how to create one here: [Create and share dashboards in the Azure portal](https://docs.microsoft.com/azure/azure-portal/azure-portal-dashboards#publish-and-share-a-dashboard).
52+
53+
## Azure Monitor alerts
54+
55+
You can also set up Azure Monitor alerts that will trigger when the value of a metric or the results of a query meet certain conditions. As an example, let's create an alert to send an email when one or more nodes hasn't sent a heartbeat in 5 hours (i.e. is presumed to be unavailable).
56+
57+
From **Logs**, run the **Unavailable computers** sample query by selecting **Run** on that query, as shown below.
58+
59+
![Log Analytics workspace logs 'unavailable computers' sample](media/cluster-availability-monitor-logs/portal-unavailable-computers.png)
60+
61+
If all nodes are available, this query should return zero results for now. Click **New alert rule** to begin configuring your alert for this query.
62+
63+
![Log Analytics workspace new alert rule](media/cluster-availability-monitor-logs/portal-logs-new-alert-rule.png)
64+
65+
There are three components to an alert: the *resource* for which to create the rule (the Log Analytics workspace in this case), the *condition* to trigger the alert, and the *action groups* that determine what will happen when the alert is triggered.
66+
Click the **condition title**, as shown below, to finish configuring the signal logic.
67+
68+
![Portal alert create rule condition](media/cluster-availability-monitor-logs/portal-condition-title.png)
69+
70+
This will open **Configure signal logic**.
71+
72+
Set the **Alert logic** section as follows:
73+
74+
*Based on: Number of results, Condition: Greater than, Threshold: 0.*
75+
76+
Since this query only returns unavailable nodes as results, if the number of results is ever greater than 0, the alert should fire.
77+
78+
In the **Evaluated based on** section, set the **period** and **frequency** based on how often you want to check for unavailable nodes.
79+
80+
For the purpose of this alert, you want to make sure **Period=Frequency.** More information about period, frequency, and other alert parameters can be found [here](https://docs.microsoft.com/azure/azure-monitor/platform/alerts-unified-log#log-search-alert-rule---definition-and-types).
81+
82+
Select **Done** when you're finished configuring the signal logic.
83+
84+
![Alert rule configures signal logic](media/cluster-availability-monitor-logs/portal-configure-signal-logic.png)
85+
86+
If you don't already have an existing action group, click **Create New** under the **Action Groups** section.
87+
88+
![Alert rule creates new action group](media/cluster-availability-monitor-logs/portal-create-new-action-group.png)
89+
90+
This will open **Add action group**. Choose an **Action group name**, **Short name**, **Subscription**, and **Resource group.** Under the **Actions** section, choose an **Action Name** and select **Email/SMS/Push/Voice** as the **Action Type.**
91+
92+
> [!NOTE]
93+
> There are several other actions an alert can trigger besides an Email/SMS/Push/Voice, such as an Azure Function, LogicApp, Webhook, ITSM, and Automation Runbook. [Learn More.](https://docs.microsoft.com/azure/azure-monitor/platform/action-groups#action-specific-information)
94+
95+
This will open **Email/SMS/Push/Voice**. Choose a **Name** for the recipient, **check** the **Email** box, and type an email address to which you want the alert sent. Select **OK** in **Email/SMS/Push/Voice**, then in **Add action group** to finish configuring your action group.
96+
97+
![Alert rule creates add action group](media/cluster-availability-monitor-logs/portal-add-action-group.png)
98+
99+
After these blades close, you should see your action group listed under the **Action Groups** section. Finally, complete the **Alert Details** section by typing an **Alert Rule Name** and **Description** and choosing a **Severity**. Click **Create Alert Rule** to finish.
100+
101+
![Portal creates alert rule finish](media/cluster-availability-monitor-logs/portal-create-alert-rule-finish.png)
102+
103+
> [!TIP]
104+
> The ability to specify **Severity** is a powerful tool that can be used when creating multiple alerts. For example, you could create one alert to raise a Warning (Sev 1) if a single head node goes down and another alert that raises Critical (Sev 0) in the unlikely event that both head nodes go down.
105+
106+
When the condition for this alert is met, the alert will fire and you'll receive an email with the alert details like this:
107+
108+
![Azure Monitor alert email example](media/cluster-availability-monitor-logs/portal-oms-alert-email.png)
109+
110+
You can also view all alerts that have fired, grouped by severity, by going to **Alerts** in your **Log Analytics Workspace**.
111+
112+
![Log Analytics workspace alerts](media/cluster-availability-monitor-logs/hdi-portal-oms-alerts.png)
113+
114+
Selecting on a severity grouping (i.e. **Sev 1,** as highlighted above) will show records for all alerts of that severity that have fired like below:
115+
116+
![Log Analytics workspace sev one alert](media/cluster-availability-monitor-logs/portal-oms-alerts-sev1.png)
117+
118+
## Next steps
119+
120+
* [Cluster availability - Apache Ambari](./hdinsight-cluster-availability.md)
121+
* [Use Azure Monitor logs](hdinsight-hadoop-oms-log-analytics-tutorial.md)

0 commit comments

Comments
 (0)