Skip to content

Commit 2246fed

Browse files
authored
Merge pull request #107041 from dagiro/ts_hdinsight6
ts_hdinsight6
2 parents 99285aa + ac7eb0d commit 2246fed

File tree

1 file changed

+43
-3
lines changed

1 file changed

+43
-3
lines changed

articles/hdinsight/hdinsight-key-scenarios-to-monitor.md

Lines changed: 43 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: conceptual
99
ms.custom: hdinsightactive
10-
ms.date: 11/27/2019
10+
ms.date: 03/09/2020
1111
---
1212

1313
# Monitor cluster performance in Azure HDInsight
@@ -28,7 +28,7 @@ To get a high-level look at the nodes of your cluster and their loading, sign in
2828
| Orange | At least one secondary component on the host is down. Hover to see a tooltip that lists affected components. |
2929
| Yellow | Ambari Server hasn't received a heartbeat from the host for more than 3 minutes. |
3030
| Green | Normal running state. |
31-
31+
3232
You'll also see columns showing the number of cores and amount of RAM for each host, and the disk usage and load average.
3333

3434
![Apache Ambari hosts tab overview](./media/hdinsight-key-scenarios-to-monitor/apache-ambari-hosts-tab.png)
@@ -47,7 +47,7 @@ YARN divides the two responsibilities of the JobTracker, resource management and
4747

4848
The Resource Manager is a *pure scheduler*, and solely arbitrates available resources between all competing applications. The Resource Manager ensures that all resources are always in use, optimizing for various constants such as SLAs, capacity guarantees, and so forth. The ApplicationMaster negotiates resources from the Resource Manager, and works with the NodeManager(s) to execute and monitor the containers and their resource consumption.
4949

50-
When multiple tenants share a large cluster, there's competition for the cluster's resources. The CapacityScheduler is a pluggable scheduler that assists in resource sharing by queueing up requests. The CapacityScheduler also supports *hierarchical queues* to ensure that resources are shared between the sub-queues of an organization, before other applications' queues are allowed to use free resources.
50+
When multiple tenants share a large cluster, there's competition for the cluster's resources. The CapacityScheduler is a pluggable scheduler that assists in resource sharing by queueing up requests. The CapacityScheduler also supports *hierarchical queues* to ensure that resources are shared between the subqueues of an organization, before other applications' queues are allowed to use free resources.
5151

5252
YARN allows us to allocate resources to these queues, and shows you whether all of your available resources are assigned. To view information about your queues, sign in to the Ambari Web UI, then select **YARN Queue Manager** from the top menu.
5353

@@ -77,6 +77,46 @@ If your cluster's backing store is Azure Data Lake Storage (ADLS), your throttli
7777
* [Performance tuning guidance for MapReduce on HDInsight and Azure Data Lake Storage](../data-lake-store/data-lake-store-performance-tuning-mapreduce.md)
7878
* [Performance tuning guidance for Apache Storm on HDInsight and Azure Data Lake Storage](../data-lake-store/data-lake-store-performance-tuning-storm.md)
7979

80+
## Troubleshoot sluggish node performance
81+
82+
In some cases, sluggishness can occur because of low disk space on the cluster. Investigate with these steps:
83+
84+
1. Use [ssh command](./hdinsight-hadoop-linux-use-ssh-unix.md) to connect into each of the nodes.
85+
86+
1. Check the disk usage by running one of the following commands:
87+
88+
```bash
89+
df -h
90+
du -h --max-depth=1 / | sort -h
91+
```
92+
93+
1. Review the output, and check for the presence of any large files in the `mnt` folder or other folders. Typically, the `usercache`, and `appcache` (mnt/resource/hadoop/yarn/local/usercache/hive/appcache/) folders contain large files.
94+
95+
1. If there are large files, either a current job is causing the file growth or a failed previous job may have contributed to this issue. To check whether this behavior is caused by a current job, run the following command:
96+
97+
```bash
98+
sudo du -h --max-depth=1 /mnt/resource/hadoop/yarn/local/usercache/hive/appcache/
99+
```
100+
101+
1. If this command indicates a specific job, you can choose to terminate the job by using a command that resembles the following:
102+
103+
```bash
104+
yarn application -kill -applicationId <application_id>
105+
```
106+
107+
Replace `application_id` with the application ID. If no specific jobs are indicated, go to the next step.
108+
109+
1. After the command above completes, or if no specific jobs are indicated, delete the large files you identified by running a command that resembles the following:
110+
111+
```bash
112+
rm -rf filecache usercache
113+
```
114+
115+
For more information regarding disk space issues, see [Out of disk space](./hadoop/hdinsight-troubleshoot-out-disk-space.md).
116+
117+
> [!NOTE]
118+
> If you have large files that you want to keep but are contributing to the low disk space issue, you have to scale up your HDInsight cluster and restart your services. After you complete this procedure and wait for a few minutes, you will notice that the storage is freed up and the node's usual performance is restored.
119+
80120
## Next steps
81121
82122
Visit the following links for more information about troubleshooting and monitoring your clusters:

0 commit comments

Comments
 (0)