You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-log-management.md
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
8
ms.custom: hdinsightactive
9
9
ms.topic: conceptual
10
-
ms.date: 03/19/2019
10
+
ms.date: 11/07/2019
11
11
---
12
12
13
13
# Manage logs for an HDInsight cluster
@@ -30,7 +30,7 @@ The first step in creating a HDInsight cluster log management strategy is to gat
30
30
31
31
### Cluster details
32
32
33
-
The following cluster details are useful in helping to gather information in your log management strategy. Gather this information from all HDInsight clusters you have created in a particular Azure account.
33
+
The following cluster details are useful in helping to gather information in your log management strategy. Gather this information from all HDInsight clusters you've created in a particular Azure account.
34
34
35
35
* Cluster name
36
36
* Cluster region and Azure availability zone
@@ -40,8 +40,8 @@ The following cluster details are useful in helping to gather information in you
40
40
You can get most of this top-level information using the Azure portal. Alternatively, you can use [Azure CLI](https://docs.microsoft.com/cli/azure/?view=azure-cli-latest) to get information about your HDInsight cluster(s):
41
41
42
42
```azurecli
43
-
az hdinsight list --resource-group <ResourceGroup>
44
-
az hdinsight show --resource-group <ResourceGroup> --name <ClusterName>
43
+
az hdinsight list --resource-group <ResourceGroup>
44
+
az hdinsight show --resource-group <ResourceGroup> --name <ClusterName>
45
45
```
46
46
47
47
You can also use PowerShell to view this information. For more information, see [Apache Manage Hadoop clusters in HDInsight by using Azure PowerShell](hdinsight-administer-use-powershell.md).
@@ -70,7 +70,7 @@ A typical HDInsight cluster uses several services and open-source software packa
70
70
71
71
### View cluster configuration settings with the Ambari UI
72
72
73
-
Apache Ambari simplifies the management, configuration, and monitoring of a HDInsight cluster by providing a web UI and a REST API. Ambari is included on Linux-based HDInsight clusters. Select the **Cluster Dashboard** pane on the Azure portal HDInsight page to open the **Cluster Dashboards** link page. Next, select the **HDInsight cluster dashboard** pane to open the Ambari UI. You are prompted for your cluster login credentials.
73
+
Apache Ambari simplifies the management, configuration, and monitoring of a HDInsight cluster by providing a web UI and a REST API. Ambari is included on Linux-based HDInsight clusters. Select the **Cluster Dashboard** pane on the Azure portal HDInsight page to open the **Cluster Dashboards** link page. Next, select the **HDInsight cluster dashboard** pane to open the Ambari UI. You're prompted for your cluster login credentials.
74
74
75
75
To open a list of service views, select the **Ambari Views** pane on the Azure portal page for HDInsight. This list varies, depending on which libraries you've installed. For example, you may see YARN Queue Manager, Hive View, and Tez View. Select any service link to see configuration and service information. The Ambari UI **Stack and Version** page provides information about the cluster services' configuration and service version history. To navigate to this section of the Ambari UI, select the **Admin** menu and then **Stacks and Versions**. Select the **Versions** tab to see service version information.
76
76
@@ -86,15 +86,15 @@ HDInsight [script actions](hdinsight-hadoop-customize-cluster-linux.md) run scri
86
86
87
87
## Step 3: Manage the cluster job execution log files
88
88
89
-
The next step is reviewing the job execution log files for the various services. Services could include Apache HBase, Apache Spark, and many others. A Hadoop cluster produces a large number of verbose logs, so determining which logs are useful (and which are not) can be time-consuming. Understanding the logging system is important for targeted management of log files. The following is an example log file.
89
+
The next step is reviewing the job execution log files for the various services. Services could include Apache HBase, Apache Spark, and many others. A Hadoop cluster produces a large number of verbose logs, so determining which logs are useful (and which aren't) can be time-consuming. Understanding the logging system is important for targeted management of log files. The following is an example log file.
90
90
91
91

92
92
93
93
### Access the Hadoop log files
94
94
95
95
HDInsight stores its log files both in the cluster file system and in Azure storage. You can examine log files in the cluster by opening an [SSH](hdinsight-hadoop-linux-use-ssh-unix.md) connection to the cluster and browsing the file system, or by using the Hadoop YARN Status portal on the remote head node server. You can examine the log files in Azure storage using any of the tools that can access and download data from Azure storage. Examples are [AzCopy](../storage/common/storage-use-azcopy.md), [CloudXplorer](https://clumsyleaf.com/products/cloudxplorer), and the Visual Studio Server Explorer. You can also use PowerShell and the Azure Storage Client libraries, or the Azure .NET SDKs, to access data in Azure blob storage.
96
96
97
-
Hadoop runs the work of the jobs as *task attempts* on various nodes in the cluster. HDInsight can initiate speculative task attempts, terminating any other task attempts that do not complete first. This generates significant activity that is logged to the controller, stderr, and syslog log files on-the-fly. In addition, multiple task attempts are running simultaneously, but a log file can only display results linearly.
97
+
Hadoop runs the work of the jobs as *task attempts* on various nodes in the cluster. HDInsight can initiate speculative task attempts, terminating any other task attempts that don't complete first. This generates significant activity that is logged to the controller, stderr, and syslog log files on-the-fly. In addition, multiple task attempts are running simultaneously, but a log file can only display results linearly.
98
98
99
99
#### HDInsight logs written to Azure Blob storage
100
100
@@ -104,21 +104,21 @@ In addition to the core log files generated by HDInsight, installed services suc
104
104
105
105
### HDInsight logs generated by YARN
106
106
107
-
YARN aggregates logs across all containers on a worker node and stores those logs as one aggregated log file per worker node. That log is stored on the default file system after an application finishes. Your application may use hundreds or thousands of containers, but logs for all containers that are run on a single worker node are always aggregated to a single file. There is only one log per worker node used by your application. Log aggregation is enabled by default on HDInsight clusters version 3.0 and above. Aggregated logs are located in default storage for the cluster.
107
+
YARN aggregates logs across all containers on a worker node and stores those logs as one aggregated log file per worker node. That log is stored on the default file system after an application finishes. Your application may use hundreds or thousands of containers, but logs for all containers that are run on a single worker node are always aggregated to a single file. There's only one log per worker node used by your application. Log aggregation is enabled by default on HDInsight clusters version 3.0 and above. Aggregated logs are located in default storage for the cluster.
108
108
109
109
```
110
-
/app-logs/<user>/logs/<applicationId>
110
+
/app-logs/<user>/logs/<applicationId>
111
111
```
112
112
113
-
The aggregated logs are not directly readable, as they are written in a TFile binary format indexed by container. Use the YARN ResourceManager logs or CLI tools to view these logs as plain text for applications or containers of interest.
113
+
The aggregated logs aren't directly readable, as they're written in a TFile binary format indexed by container. Use the YARN ResourceManager logs or CLI tools to view these logs as plain text for applications or containers of interest.
114
114
115
115
#### YARN CLI tools
116
116
117
117
To use the YARN CLI tools, you must first connect to the HDInsight cluster using SSH. Specify the `<applicationId>`, `<user-who-started-the-application>`, `<containerId>`, and `<worker-node-address>` information when running these commands. You can view the logs as plain text with one of the following commands:
@@ -127,21 +127,21 @@ The YARN ResourceManager UI runs on the cluster head node, and is accessed throu
127
127
128
128
1. In a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net`. Replace CLUSTERNAME with the name of your HDInsight cluster.
129
129
2. From the list of services on the left, select YARN.
130
-
3. From the Quick Links dropdown, select one of the cluster head nodes and then select **ResourceManager logs**. You are presented with a list of links to YARN logs.
130
+
3. From the Quick Links dropdown, select one of the cluster head nodes and then select **ResourceManager logs**. You're presented with a list of links to YARN logs.
131
131
132
132
## Step 4: Forecast log volume storage sizes and costs
133
133
134
134
After completing the previous steps, you have an understanding of the types and volumes of log files that your HDInsight cluster(s) are producing.
135
135
136
136
Next, analyze the volume of log data in key log storage locations over a period of time. For example, you can analyze volume and growth over 30-60-90 day periods. Record this information in a spreadsheet or use other tools such as Visual Studio, the Azure Storage Explorer, or Power Query for Excel. For more information, see [Analyze HDInsight logs](hdinsight-debug-jobs.md).
137
137
138
-
You now have enough information to create a log management strategy for the key logs. Use your spreadsheet (or tool of choice) to forecast both log size growth and log storage Azure service costs going forward. Consider also any log retention requirements for the set of logs that you are examining. Now you can reforecast future log storage costs, after determining which log files can be deleted (if any) and which logs should be retained and archived to less expensive Azure storage.
138
+
You now have enough information to create a log management strategy for the key logs. Use your spreadsheet (or tool of choice) to forecast both log size growth and log storage Azure service costs going forward. Consider also any log retention requirements for the set of logs that you're examining. Now you can reforecast future log storage costs, after determining which log files can be deleted (if any) and which logs should be retained and archived to less expensive Azure storage.
139
139
140
140
## Step 5: Determine log archive policies and processes
141
141
142
142
After you determine which log files can be deleted, you can adjust logging parameters on many Hadoop services to automatically delete log files after a specified time period.
143
143
144
-
For certain log files, you can use a lower-priced log file archiving approach. For Azure Resource Manager activity logs, you can explore this approach using the Azure portal. Set up archiving of the ARM logs by selecting the **Activity Log**' link in the Azure portal for your HDInsight instance. On the top of the Activity Log search page, select the **Export** menu item to open the **Export activity log** pane. Fill in the subscription, region, whether to export to a storage account, and how many days to retain the logs. On this same pane, you can also indicate whether to export to an event hub.
144
+
For certain log files, you can use a lower-priced log file archiving approach. For Azure Resource Manager activity logs, you can explore this approach using the Azure portal. Set up archiving of the Resource Manager logs by selecting the **Activity Log** link in the Azure portal for your HDInsight instance. On the top of the Activity Log search page, select the **Export** menu item to open the **Export activity log** pane. Fill in the subscription, region, whether to export to a storage account, and how many days to retain the logs. On this same pane, you can also indicate whether to export to an event hub.
0 commit comments