Skip to content

Commit da811bf

Browse files
Merge pull request #101967 from dagiro/ts_ambari4
ts_ambari4
2 parents d21f7a5 + 212ffe7 commit da811bf

File tree

3 files changed

+80
-0
lines changed

3 files changed

+80
-0
lines changed

articles/hdinsight/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -501,6 +501,8 @@
501501
href: ./hadoop/apache-ambari-troubleshoot-fivezerotwo-error.md
502502
- name: Apache Ambari shows down hosts and services
503503
href: ./hadoop/apache-ambari-troubleshoot-down-hosts-services.md
504+
- name: Apache Ambari stale alerts
505+
href: ./hadoop/apache-ambari-troubleshoot-stale-alerts.md
504506
- name: Troubleshoot a slow or failing HDInsight cluster
505507
href: ./hdinsight-troubleshoot-failed-cluster.md
506508
- name: Apache Hadoop HDFS troubleshooting
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
---
2+
title: Apache Ambari stale alerts in Azure HDInsight
3+
description: Discussion and analysis of possible reasons and solutions for stale Apache Ambari alerts in HDInsight.
4+
author: hrasheed-msft
5+
ms.author: hrasheed
6+
ms.reviewer: jasonh
7+
ms.service: hdinsight
8+
ms.topic: troubleshooting
9+
ms.date: 01/22/2020
10+
---
11+
12+
# Scenario: Apache Ambari stale alerts in Azure HDInsight
13+
14+
This article describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters.
15+
16+
## Issue
17+
18+
From the Apache Ambari UI, you may see an alert similar to the following image:
19+
20+
![Apache Ambari stale alert example](./media/apache-ambari-troubleshoot-stale-alerts/ambari-stale-alerts-example.png)
21+
22+
## Cause
23+
24+
Ambari agents continually execute health checks to monitor the health of many resources. Each alert is configured to run at predefined intervals of time. After execution of each alert, Ambari agents report back the status to the Ambari server. At this point if Ambari server detects that any of the alerts weren't run in a timely manner, then it triggers an "Ambari Server Alerts". There are various reasons why a health check might not execute at its defined interval:
25+
26+
* When hosts are under heavy utilization (high CPU), there's a possibility that the Ambari Agent wasn't able get enough system resources to execute the alerts in timely manner.
27+
28+
* The cluster is busy executing many jobs/services during heavy load.
29+
30+
* Few hosts in the cluster may host many components and hence will be required to run many alerts. If the number of components is large, it's possible that alert jobs may miss their scheduled intervals
31+
32+
## Resolution
33+
34+
### Increase alert interval time
35+
36+
You can choose to increase the value of an individual alert interval based on the response time of your cluster and its load.
37+
38+
1. From the Apache Ambari UI, select the **Alerts** tab.
39+
1. Select the desired alert definition name.
40+
1. From the definition, select **Edit**.
41+
1. Modify the **Check Interval** value as desired, and then select **Save**.
42+
43+
### Increase alert interval time for Ambari Server Alerts
44+
45+
1. From the Apache Ambari UI, select the **Alerts** tab.
46+
1. From the **Groups** drop-down list, select **AMBARI Default**.
47+
1. Select alert **Ambari Server Alerts**.
48+
1. From the definition, select **Edit**.
49+
1. Modify the **Check Interval** value as desired.
50+
1. Modify the **Interval Multiplier** value as desired, and then select **Save**.
51+
52+
### Disable and enable the alert
53+
54+
You can disable and then again enable the alert to discard any stale alerts.
55+
56+
1. From the Apache Ambari UI, select the **Alerts** tab.
57+
1. Select the desired alert definition name.
58+
1. From the definition, select **Enabled** located on the far right.
59+
1. From the **Confirmation** pop-up, select **Confirm Disable**.
60+
1. Wait a few seconds for all the alert "Instances" shown on the page are cleared.
61+
1. From the definition, select **Disabled** located on the far right.
62+
1. From the **Confirmation** pop-up, select **Confirm Enable**.
63+
64+
### Increase alert grace time
65+
66+
Before Ambari agent reports that a configured alert missed its schedule, there's a grace time applied. Even if the alert missed its scheduled time but was triggered within the alert grace time, then stale alert isn't fired.
67+
68+
The default `alert_grace_period` value is 5 seconds. This `alert_grace_period` setting is configurable in `/etc/ambari-agent/conf/ambari-agent.ini`. For those hosts from which the stale alerts are fired at regular intervals, try to increase to a value of 10. Then restart the Ambari agent
69+
70+
## Next steps
71+
72+
If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support:
73+
74+
* Get answers from Azure experts through [Azure Community Support](https://azure.microsoft.com/support/community/).
75+
76+
* Connect with [@AzureSupport](https://twitter.com/azuresupport) - the official Microsoft Azure account for improving customer experience. Connecting the Azure community to the right resources: answers, support, and experts.
77+
78+
* If you need more help, you can submit a support request from the [Azure portal](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade/). Select **Support** from the menu bar or open the **Help + support** hub. For more detailed information, review [How to create an Azure support request](https://docs.microsoft.com/azure/azure-supportability/how-to-create-azure-support-request). Access to Subscription Management and billing support is included with your Microsoft Azure subscription, and Technical Support is provided through one of the [Azure Support Plans](https://azure.microsoft.com/support/plans/).
41.6 KB
Loading

0 commit comments

Comments
 (0)