Skip to content

Commit 4d091a1

Browse files
authored
Merge pull request #102174 from dagiro/ts_yarn3
ts_yarn3
2 parents 79d724c + 7c35c26 commit 4d091a1

File tree

1 file changed

+95
-8
lines changed

1 file changed

+95
-8
lines changed

articles/hdinsight/hdinsight-hadoop-access-yarn-app-logs-linux.md

Lines changed: 95 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ author: hrasheed-msft
55
ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
8-
ms.custom: hdinsightactive
98
ms.topic: conceptual
10-
ms.date: 11/15/2019
9+
ms.custom: hdinsightactive
10+
ms.date: 01/23/2020
1111
---
1212

1313
# Access Apache Hadoop YARN application logs on Linux-based HDInsight
@@ -22,7 +22,7 @@ Each application may consist of multiple *application attempts*. If an applicati
2222

2323
To scale your cluster to support greater processing throughput, you can use [Autoscale](hdinsight-autoscale-clusters.md) or [Scale your clusters manually using a few different languages](hdinsight-scaling-best-practices.md#utilities-to-scale-clusters).
2424

25-
## <a name="YARNTimelineServer"></a>YARN Timeline Server
25+
## YARN Timeline Server
2626

2727
The [Apache Hadoop YARN Timeline Server](https://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/TimelineServer.html) provides generic information on completed applications
2828

@@ -33,36 +33,123 @@ YARN Timeline Server includes the following type of data:
3333
* Information on attempts made to complete the application
3434
* The containers used by any given application attempt
3535

36-
## <a name="YARNAppsAndLogs"></a>YARN applications and logs
36+
## YARN applications and logs
3737

3838
YARN supports multiple programming models ([Apache Hadoop MapReduce](https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html) being one of them) by decoupling resource management from application scheduling/monitoring. YARN uses a global *ResourceManager* (RM), per-worker-node *NodeManagers* (NMs), and per-application *ApplicationMasters* (AMs). The per-application AM negotiates resources (CPU, memory, disk, network) for running your application with the RM. The RM works with NMs to grant these resources, which are granted as *containers*. The AM is responsible for tracking the progress of the containers assigned to it by the RM. An application may require many containers depending on the nature of the application.
3939

4040
Each application may consist of multiple *application attempts*. If an application fails, it may be retried as a new attempt. Each attempt runs in a container. In a sense, a container provides the context for basic unit of work performed by a YARN application. All work that is done within the context of a container is performed on the single worker node on which the container was allocated. See [Apache Hadoop YARN Concepts](https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html) for further reference.
4141

4242
Application logs (and the associated container logs) are critical in debugging problematic Hadoop applications. YARN provides a nice framework for collecting, aggregating, and storing application logs with the [Log Aggregation](https://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/) feature. The Log Aggregation feature makes accessing application logs more deterministic. It aggregates logs across all containers on a worker node and stores them as one aggregated log file per worker node. The log is stored on the default file system after an application finishes. Your application may use hundreds or thousands of containers, but logs for all containers run on a single worker node are always aggregated to a single file. So there's only 1 log per worker node used by your application. Log Aggregation is enabled by default on HDInsight clusters version 3.0 and above. Aggregated logs are located in default storage for the cluster. The following path is the HDFS path to the logs:
4343

44-
/app-logs/<user>/logs/<applicationId>
44+
```
45+
/app-logs/<user>/logs/<applicationId>
46+
```
4547

4648
In the path, `user` is the name of the user who started the application. The `applicationId` is the unique identifier assigned to an application by the YARN RM.
4749

4850
The aggregated logs aren't directly readable, as they're written in a [TFile](https://issues.apache.org/jira/secure/attachment/12396286/TFile%20Specification%2020081217.pdf), [binary format](https://issues.apache.org/jira/browse/HADOOP-3315) indexed by container. Use the YARN ResourceManager logs or CLI tools to view these logs as plain text for applications or containers of interest.
4951

52+
## Yarn logs in an ESP cluster
53+
54+
Two configurations must be added to the custom `mapred-site` in Ambari.
55+
56+
1. From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net`, where `CLUSTERNAME` is the name of your cluster.
57+
58+
1. From the Ambari UI, navigate to **MapReduce2** > **Configs** > **Advanced** > **Custom mapred-site**.
59+
60+
1. Add *one* of the following sets of properties:
61+
62+
**Set 1**
63+
64+
```
65+
mapred.acls.enabled=true
66+
mapreduce.job.acl-view-job=*
67+
```
68+
69+
**Set 2**
70+
71+
```
72+
mapreduce.job.acl-view-job=<user1>,<user2>,<user3>
73+
```
74+
75+
1. Save changes and restart all affected services.
76+
5077
## YARN CLI tools
5178
52-
To use the YARN CLI tools, you must first connect to the HDInsight cluster using SSH. For information, see [Use SSH with HDInsight](hdinsight-hadoop-linux-use-ssh-unix.md).
79+
1. Use [ssh command](./hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
80+
81+
```cmd
82+
83+
```
84+
85+
1. List all the application ids of the currently running Yarn applications with the following command:
86+
87+
```bash
88+
yarn top
89+
```
5390
54-
You can view these logs as plain text by running one of the following commands:
91+
Note the application id from the `APPLICATIONID` column whose logs are to be downloaded.
5592
93+
```output
94+
YARN top - 18:00:07, up 19d, 0:14, 0 active users, queue(s): root
95+
NodeManager(s): 4 total, 4 active, 0 unhealthy, 0 decommissioned, 0 lost, 0 rebooted
96+
Queue(s) Applications: 2 running, 10 submitted, 0 pending, 8 completed, 0 killed, 0 failed
97+
Queue(s) Mem(GB): 97 available, 3 allocated, 0 pending, 0 reserved
98+
Queue(s) VCores: 58 available, 2 allocated, 0 pending, 0 reserved
99+
Queue(s) Containers: 2 allocated, 0 pending, 0 reserved
100+
101+
APPLICATIONID USER TYPE QUEUE #CONT #RCONT VCORES RVCORES MEM RMEM VCORESECS MEMSECS %PROGR TIME NAME
102+
application_1490377567345_0007 hive spark thriftsvr 1 0 1 0 1G 0G 1628407 2442611 10.00 18:20:20 Thrift JDBC/ODBC Server
103+
application_1490377567345_0006 hive spark thriftsvr 1 0 1 0 1G 0G 1628430 2442645 10.00 18:20:20 Thrift JDBC/ODBC Server
104+
```
105+
106+
1. You can view these logs as plain text by running one of the following commands:
107+
108+
```bash
56109
yarn logs -applicationId <applicationId> -appOwner <user-who-started-the-application>
57110
yarn logs -applicationId <applicationId> -appOwner <user-who-started-the-application> -containerId <containerId> -nodeAddress <worker-node-address>
111+
```
112+
113+
Specify the &lt;applicationId>, &lt;user-who-started-the-application>, &lt;containerId>, and &lt;worker-node-address> information when running these commands.
114+
115+
### Other sample commands
116+
117+
1. Download Yarn containers logs for all application masters with the command below. This will create the log file named `amlogs.txt` in text format.
118+
119+
```bash
120+
yarn logs -applicationId <application_id> -am ALL > amlogs.txt
121+
```
122+
123+
1. Download Yarn container logs for only the latest application master with the following command:
58124
59-
Specify the &lt;applicationId>, &lt;user-who-started-the-application>, &lt;containerId>, and &lt;worker-node-address> information when running these commands.
125+
```bash
126+
yarn logs -applicationId <application_id> -am -1 > latestamlogs.txt
127+
```
128+
129+
1. Download YARN container logs for first two application masters with the following command:
130+
131+
```bash
132+
yarn logs -applicationId <application_id> -am 1,2 > first2amlogs.txt
133+
```
134+
135+
1. Download all Yarn container logs with the following command:
136+
137+
```bash
138+
yarn logs -applicationId <application_id> > logs.txt
139+
```
140+
141+
1. Download yarn container log for a particular container with the following command:
142+
143+
```bash
144+
yarn logs -applicationId <application_id> -containerId <container_id> > containerlogs.txt
145+
```
60146
61147
## YARN ResourceManager UI
62148
63149
The YARN ResourceManager UI runs on the cluster headnode. It's accessed through the Ambari web UI. Use the following steps to view the YARN logs:
64150
65151
1. In your web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net`. Replace CLUSTERNAME with the name of your HDInsight cluster.
152+
66153
2. From the list of services on the left, select **YARN**.
67154
68155
![Apache Ambari Yarn service selected](./media/hdinsight-hadoop-access-yarn-app-logs-linux/yarn-service-selected.png)

0 commit comments

Comments
 (0)