Skip to content

Commit 2dc6368

Browse files
committed
freshness_c30
1 parent 9236b18 commit 2dc6368

File tree

1 file changed

+18
-15
lines changed

1 file changed

+18
-15
lines changed

articles/hdinsight/hdinsight-hadoop-access-yarn-app-logs-linux.md

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -7,18 +7,18 @@ ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: conceptual
99
ms.custom: hdinsightactive
10-
ms.date: 01/23/2020
10+
ms.date: 04/23/2020
1111
---
1212

1313
# Access Apache Hadoop YARN application logs on Linux-based HDInsight
1414

15-
Learn how to access the logs for [Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) (Yet Another Resource Negotiator) applications on an [Apache Hadoop](https://hadoop.apache.org/) cluster in Azure HDInsight.
15+
Learn how to access the logs for [Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) (Yet Another Resource Negotiator) applications on an Apache Hadoop cluster in Azure HDInsight.
1616

1717
## What is Apache YARN?
1818

19-
YARN supports multiple programming models ([Apache Hadoop MapReduce](https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html) being one of them) by decoupling resource management from application scheduling/monitoring. YARN uses a global *ResourceManager* (RM), per-worker-node *NodeManagers* (NMs), and per-application *ApplicationMasters* (AMs). The per-application AM negotiates resources (CPU, memory, disk, network) for running your application with the RM. The RM works with NMs to grant these resources, which are granted as *containers*. The AM is responsible for tracking the progress of the containers assigned to it by the RM. An application may require many containers depending on the nature of the application.
19+
YARN supports multiple programming models (Apache Hadoop MapReduce being one of them) by decoupling resource management from application scheduling/monitoring. YARN uses a global *`ResourceManager`* (RM), per-worker-node *NodeManagers* (NMs), and per-application *ApplicationMasters* (AMs). The per-application AM negotiates resources (CPU, memory, disk, network) for running your application with the RM. The RM works with NMs to grant these resources, which are granted as *containers*. The AM is responsible for tracking the progress of the containers assigned to it by the RM. An application may require many containers depending on the nature of the application.
2020

21-
Each application may consist of multiple *application attempts*. If an application fails, it may be retried as a new attempt. Each attempt runs in a container. In a sense, a container provides the context for basic unit of work performed by a YARN application. All work that is done within the context of a container is performed on the single worker node on which the container was allocated. See [Hadoop: Writing YARN Applications](https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html), or [Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) for further reference.
21+
Each application may consist of multiple *application attempts*. If an application fails, it may be retried as a new attempt. Each attempt runs in a container. In a sense, a container provides the context for basic unit of work done by a YARN application. All work that is done within the context of a container is done on the single worker node on which the container was given. See [Hadoop: Writing YARN Applications](https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html), or [Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) for further reference.
2222

2323
To scale your cluster to support greater processing throughput, you can use [Autoscale](hdinsight-autoscale-clusters.md) or [Scale your clusters manually using a few different languages](hdinsight-scaling-best-practices.md#utilities-to-scale-clusters).
2424

@@ -35,19 +35,17 @@ YARN Timeline Server includes the following type of data:
3535

3636
## YARN applications and logs
3737

38-
YARN supports multiple programming models ([Apache Hadoop MapReduce](https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html) being one of them) by decoupling resource management from application scheduling/monitoring. YARN uses a global *ResourceManager* (RM), per-worker-node *NodeManagers* (NMs), and per-application *ApplicationMasters* (AMs). The per-application AM negotiates resources (CPU, memory, disk, network) for running your application with the RM. The RM works with NMs to grant these resources, which are granted as *containers*. The AM is responsible for tracking the progress of the containers assigned to it by the RM. An application may require many containers depending on the nature of the application.
38+
Application logs (and the associated container logs) are critical in debugging problematic Hadoop applications. YARN provides a nice framework for collecting, aggregating, and storing application logs with [Log Aggregation](https://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/).
3939

40-
Each application may consist of multiple *application attempts*. If an application fails, it may be retried as a new attempt. Each attempt runs in a container. In a sense, a container provides the context for basic unit of work performed by a YARN application. All work that is done within the context of a container is performed on the single worker node on which the container was allocated. See [Apache Hadoop YARN Concepts](https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html) for further reference.
41-
42-
Application logs (and the associated container logs) are critical in debugging problematic Hadoop applications. YARN provides a nice framework for collecting, aggregating, and storing application logs with the [Log Aggregation](https://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/) feature. The Log Aggregation feature makes accessing application logs more deterministic. It aggregates logs across all containers on a worker node and stores them as one aggregated log file per worker node. The log is stored on the default file system after an application finishes. Your application may use hundreds or thousands of containers, but logs for all containers run on a single worker node are always aggregated to a single file. So there's only 1 log per worker node used by your application. Log Aggregation is enabled by default on HDInsight clusters version 3.0 and above. Aggregated logs are located in default storage for the cluster. The following path is the HDFS path to the logs:
40+
The Log Aggregation feature makes accessing application logs more deterministic. It aggregates logs across all containers on a worker node and stores them as one aggregated log file per worker node. The log is stored on the default file system after an application finishes. Your application may use hundreds or thousands of containers, but logs for all containers run on a single worker node are always aggregated to a single file. So there's only 1 log per worker node used by your application. Log Aggregation is enabled by default on HDInsight clusters version 3.0 and above. Aggregated logs are located in default storage for the cluster. The following path is the HDFS path to the logs:
4341

4442
```
4543
/app-logs/<user>/logs/<applicationId>
4644
```
4745

4846
In the path, `user` is the name of the user who started the application. The `applicationId` is the unique identifier assigned to an application by the YARN RM.
4947

50-
The aggregated logs aren't directly readable, as they're written in a [TFile](https://issues.apache.org/jira/secure/attachment/12396286/TFile%20Specification%2020081217.pdf), [binary format](https://issues.apache.org/jira/browse/HADOOP-3315) indexed by container. Use the YARN ResourceManager logs or CLI tools to view these logs as plain text for applications or containers of interest.
48+
The aggregated logs aren't directly readable, as they're written in a TFile, binary format indexed by container. Use the YARN `ResourceManager` logs or CLI tools to view these logs as plain text for applications or containers of interest.
5149

5250
## Yarn logs in an ESP cluster
5351

@@ -82,13 +80,13 @@ Two configurations must be added to the custom `mapred-site` in Ambari.
8280
8381
```
8482
85-
1. List all the application ids of the currently running Yarn applications with the following command:
83+
1. List all the application IDs of the currently running Yarn applications with the following command:
8684
8785
```bash
8886
yarn top
8987
```
9088
91-
Note the application id from the `APPLICATIONID` column whose logs are to be downloaded.
89+
Note the application ID from the `APPLICATIONID` column whose logs are to be downloaded.
9290
9391
```output
9492
YARN top - 18:00:07, up 19d, 0:14, 0 active users, queue(s): root
@@ -114,7 +112,7 @@ Two configurations must be added to the custom `mapred-site` in Ambari.
114112
115113
### Other sample commands
116114
117-
1. Download Yarn containers logs for all application masters with the command below. This will create the log file named `amlogs.txt` in text format.
115+
1. Download Yarn containers logs for all application masters with the command below. This step will create the log file named `amlogs.txt` in text format.
118116
119117
```bash
120118
yarn logs -applicationId <application_id> -am ALL > amlogs.txt
@@ -144,18 +142,23 @@ Two configurations must be added to the custom `mapred-site` in Ambari.
144142
yarn logs -applicationId <application_id> -containerId <container_id> > containerlogs.txt
145143
```
146144
147-
## YARN ResourceManager UI
145+
## YARN `ResourceManager` UI
148146
149-
The YARN ResourceManager UI runs on the cluster headnode. It's accessed through the Ambari web UI. Use the following steps to view the YARN logs:
147+
The YARN `ResourceManager` UI runs on the cluster headnode. It's accessed through the Ambari web UI. Use the following steps to view the YARN logs:
150148
151149
1. In your web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net`. Replace CLUSTERNAME with the name of your HDInsight cluster.
152150
153151
2. From the list of services on the left, select **YARN**.
154152
155153
![Apache Ambari Yarn service selected](./media/hdinsight-hadoop-access-yarn-app-logs-linux/yarn-service-selected.png)
156154
157-
3. From the **Quick Links** dropdown, select one of the cluster head nodes and then select **ResourceManager Log**.
155+
3. From the **Quick Links** dropdown, select one of the cluster head nodes and then select **`ResourceManager Log`**.
158156
159157
![Apache Ambari Yarn quick links](./media/hdinsight-hadoop-access-yarn-app-logs-linux/hdi-yarn-quick-links.png)
160158
161159
You're presented with a list of links to YARN logs.
160+
161+
## Next steps
162+
163+
* [Apache Hadoop architecture in HDInsight](hdinsight-hadoop-architecture.md)
164+
* [Troubleshoot Apache Hadoop YARN by using Azure HDInsight](hdinsight-troubleshoot-yarn.md)

0 commit comments

Comments
 (0)