You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hadoop/apache-hadoop-on-premises-migration-best-practices-architecture.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -99,7 +99,7 @@ Some HDInsight Hive metastore best practices are as follows:
99
99
100
100
## Best practices for different workloads
101
101
102
-
- Consider using LLAP cluster for interactive Hive queries with improved response time [LLAP](https://cwiki.apache.org/confluence/display/Hive/LLAP) is a new feature in Hive 2.0 that allows in-memory caching of queries. LLAP makes Hive queries much faster, up to [26x faster than Hive 1.x in some cases](https://hortonworks.com/blog/announcing-apache-hive-2-1-25x-faster-queries-much/).
102
+
- Consider using LLAP cluster for interactive Hive queries with improved response time [LLAP](https://cwiki.apache.org/confluence/display/Hive/LLAP) is a new feature in Hive 2.0 that allows in-memory caching of queries.
103
103
- Consider using Spark jobs in place of Hive jobs.
104
104
- Consider replacing impala-based queries with LLAP queries.
105
105
- Consider replacing MapReduce jobs with Spark jobs.
@@ -66,7 +66,7 @@ There are two types of tables that you can create with Hive:
66
66
67
67
*__Internal__: Data is stored in the Hive data warehouse. The data warehouse is located at `/hive/warehouse/` on the default storage for the cluster.
68
68
69
-
Use internal tables when one of the following conditions apply:
69
+
Use internal tables when one of the following conditions applies:
70
70
71
71
* Data is temporary.
72
72
* You want Hive to manage the lifecycle of the table and data.
@@ -173,7 +173,7 @@ These statements perform the following actions:
173
173
174
174
### Low Latency Analytical Processing (LLAP)
175
175
176
-
[LLAP](https://cwiki.apache.org/confluence/display/Hive/LLAP) (sometimes known as Live Long and Process) is a new feature in Hive 2.0 that allows in-memory caching of queries. LLAP makes Hive queries much faster, up to [26x faster than Hive 1.x in some cases](https://hortonworks.com/blog/announcing-apache-hive-2-1-25x-faster-queries-much/).
176
+
[LLAP](https://cwiki.apache.org/confluence/display/Hive/LLAP) (sometimes known as Live Long and Process) is a new feature in Hive 2.0 that allows in-memory caching of queries.
177
177
178
178
HDInsight provides LLAP in the Interactive Query cluster type. For more information, see the [Start with Interactive Query](../interactive-query/apache-interactive-query-get-started.md) document.
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-hadoop-templeton-webhcat-debug-errors.md
+5-7Lines changed: 5 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ description: Learn how to about common errors returned by WebHCat on HDInsight a
4
4
ms.service: hdinsight
5
5
ms.topic: troubleshooting
6
6
ms.custom: hdinsightactive
7
-
ms.date: 04/14/2020
7
+
ms.date: 12/07/2022
8
8
---
9
9
10
10
# Understand and resolve errors received from WebHCat on HDInsight
@@ -27,7 +27,7 @@ If the following default values are exceeded, it can degrade WebHCat performance
27
27
| --- | --- | --- |
28
28
|[yarn.scheduler.capacity.maximum-applications][maximum-applications]|The maximum number of jobs that can be active concurrently (pending or running) |10,000 |
29
29
|[templeton.exec.max-procs][max-procs]|The maximum number of requests that can be served concurrently |20 |
30
-
|[mapreduce.jobhistory.max-age-ms][max-age-ms]|The number of days that job history are retained |7 days |
30
+
|[mapreduce.jobhistory.max-age-ms][max-age-ms]|The number of days that job history are retained |seven days |
31
31
32
32
## Too many requests
33
33
@@ -45,13 +45,13 @@ If the following default values are exceeded, it can degrade WebHCat performance
45
45
| --- | --- |
46
46
| This status code usually occurs during failover between the primary and secondary HeadNode for the cluster |Wait two minutes, then retry the operation |
47
47
48
-
## Bad request Content: Could not find job
48
+
## Bad request Content: Couldn't find job
49
49
50
50
**HTTP Status code**: 400
51
51
52
52
| Cause | Resolution |
53
53
| --- | --- |
54
-
| Job details have been cleaned up by the job history cleaner |The default retention period for job history is 7 days. The default retention period can be changed by modifying `mapreduce.jobhistory.max-age-ms`. For more information, see [Modifying configuration](#modifying-configuration)|
54
+
| Job details have been cleaned up by the job history cleaner |The default retention period for job history is seven days. The default retention period can be changed by modifying `mapreduce.jobhistory.max-age-ms`. For more information, see [Modifying configuration](#modifying-configuration)|
55
55
| Job has been killed because of a failover |Retry job submission for up to two minutes |
56
56
| An Invalid job ID was used |Check if the job ID is correct |
57
57
@@ -62,7 +62,7 @@ If the following default values are exceeded, it can degrade WebHCat performance
62
62
| Cause | Resolution |
63
63
| --- | --- |
64
64
| Internal garbage collection is occurring within the WebHCat process |Wait for garbage collection to finish or restart the WebHCat service |
65
-
| Time out waiting on a response from the ResourceManager service. This error can occur when the number of active applications goes the configured maximum (default 10,000) |Wait for currently running jobs to complete or increase the concurrent job limit by modifying `yarn.scheduler.capacity.maximum-applications`. For more information, see the [Modifying configuration](#modifying-configuration) section. |
65
+
| Time out waiting on a response from the Resource Manager service. This error can occur when the number of active applications goes the configured maximum (default 10,000) |Wait for currently running jobs to complete or increase the concurrent job limit by modifying `yarn.scheduler.capacity.maximum-applications`. For more information, see the [Modifying configuration](#modifying-configuration) section. |
66
66
| Attempting to retrieve all jobs through the [GET /jobs](https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Jobs) call while `Fields` is set to `*`|Don't retrieve *all* job details. Instead use `jobid` to retrieve details for jobs only greater than certain job ID. Or, don't use `Fields`|
67
67
| The WebHCat service is down during HeadNode failover |Wait for two minutes and retry the operation |
68
68
| There are more than 500 pending jobs submitted through WebHCat |Wait until currently pending jobs have completed before submitting more jobs |
@@ -71,6 +71,4 @@ If the following default values are exceeded, it can degrade WebHCat performance
71
71
72
72
[!INCLUDE [troubleshooting next steps](includes/hdinsight-troubleshooting-next-steps.md)]
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-log-management.md
+11-12Lines changed: 11 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,12 +4,12 @@ description: Determine the types, sizes, and retention policies for HDInsight ac
4
4
ms.service: hdinsight
5
5
ms.topic: how-to
6
6
ms.custom: hdinsightactive
7
-
ms.date: 04/28/2022
7
+
ms.date: 12/07/2022
8
8
---
9
9
10
10
# Manage logs for an HDInsight cluster
11
11
12
-
An HDInsight cluster produces a variety of log files. For example, Apache Hadoop and related services, such as Apache Spark, produce detailed job execution logs. Log file management is part of maintaining a healthy HDInsight cluster. There can also be regulatory requirements for log archiving. Due to the number and size of log files, optimizing log storage and archiving helps with service cost management.
12
+
An HDInsight cluster produces variois log files. For example, Apache Hadoop and related services, such as Apache Spark, produce detailed job execution logs. Log file management is part of maintaining a healthy HDInsight cluster. There can also be regulatory requirements for log archiving. Due to the number and size of log files, optimizing log storage and archiving helps with service cost management.
13
13
14
14
Managing HDInsight cluster logs includes retaining information about all aspects of the cluster environment. This information includes all associated Azure Service logs, cluster configuration, job execution information, any error states, and other data as needed.
15
15
@@ -57,7 +57,7 @@ It's important to understand the workload types running on your HDInsight cluste
57
57
58
58
* Consider maintaining data lineage tracking by adding an identifier to each log entry, or through other techniques. This allows you to trace back the original source of the data and the operation, and follow the data through each stage to understand its consistency and validity.
59
59
60
-
* Consider how you can collect logs from the cluster, or from more than one cluster, and collate them for purposes such as auditing, monitoring, planning, and alerting. You might use a custom solution to access and download the log files on a regular basis, and combine and analyze them to provide a dashboard display. You can also add additional capabilities for alerting for security or failure detection. You can build these utilities using PowerShell, the HDInsight SDKs, or code that accesses the Azure classic deployment model.
60
+
* Consider how you can collect logs from the cluster, or from more than one cluster, and collate them for purposes such as auditing, monitoring, planning, and alerting. You might use a custom solution to access and download the log files regularly, and combine and analyze them to provide a dashboard display. You can also add other capabilities for alerting for security or failure detection. You can build these utilities using PowerShell, the HDInsight SDKs, or code that accesses the Azure classic deployment model.
61
61
62
62
* Consider whether a monitoring solution or service would be a useful benefit. The Microsoft System Center provides an [HDInsight management pack](https://systemcenter.wiki/?Get_ManagementPackBundle=Microsoft.HDInsight.mpb&FileMD5=10C7D975C6096FFAA22C84626D211259). You can also use third-party tools such as Apache Chukwa and Ganglia to collect and centralize logs. Many companies offer services to monitor Hadoop-based big data solutions, for example: Centerity, Compuware APM, Sematext SPM, and Zettaset Orchestrator.
63
63
@@ -79,7 +79,7 @@ Using the Ambari UI, you can download the configuration for any (or all) service
79
79
80
80
### View the script action logs
81
81
82
-
HDInsight [script actions](hdinsight-hadoop-customize-cluster-linux.md) run scripts on a cluster, either manually or when specified. For example, script actions can be used to install additional software on the cluster or to alter configuration settings from the default values. Script action logs can provide insight into errors that occurred during setup of the cluster, and also configuration settings' changes that could affect cluster performance and availability. To see the status of a script action, select the **ops** button on your Ambari UI, or access the status logs in the default storage account. The storage logs are available at `/STORAGE_ACCOUNT_NAME/DEFAULT_CONTAINER_NAME/custom-scriptaction-logs/CLUSTER_NAME/DATE`.
82
+
HDInsight [script actions](hdinsight-hadoop-customize-cluster-linux.md) run scripts on a cluster, either manually or when specified. For example, script actions can be used to install other software on the cluster or to alter configuration settings from the default values. Script action logs can provide insight into errors that occurred during setup of the cluster, and also configuration settings' changes that could affect cluster performance and availability. To see the status of a script action, select the **ops** button on your Ambari UI, or access the status logs in the default storage account. The storage logs are available at `/STORAGE_ACCOUNT_NAME/DEFAULT_CONTAINER_NAME/custom-scriptaction-logs/CLUSTER_NAME/DATE`.
The YARN ResourceManager UI runs on the cluster head node, and is accessed through the Ambari web UI. Use the following steps to view the YARN logs:
135
+
The YARN Resource Manager UI runs on the cluster head node, and is accessed through the Ambari web UI. Use the following steps to view the YARN logs:
136
136
137
137
1. In a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net`. Replace CLUSTERNAME with the name of your HDInsight cluster.
138
138
2. From the list of services on the left, select YARN.
139
-
3. From the Quick Links dropdown, select one of the cluster head nodes and then select **ResourceManager logs**. You're presented with a list of links to YARN logs.
139
+
3. From the Quick Links dropdown, select one of the cluster head nodes and then select **Resource Manager logs**. You're presented with a list of links to YARN logs.
140
140
141
141
## Step 4: Forecast log volume storage sizes and costs
142
142
@@ -158,26 +158,25 @@ Alternatively, you can script log archiving with PowerShell. For an example Pow
158
158
159
159
### Accessing Azure Storage metrics
160
160
161
-
Azure Storage can be configured to log storage operations and access. You can use these very detailed logs for capacity monitoring and planning, and for auditing requests to storage. The logged information includes latency details, enabling you to monitor and fine-tune the performance of your solutions.
161
+
Azure Storage can be configured to log storage operations and access. You can use these detailed logs for capacity monitoring and planning, and for auditing requests to storage. The logged information includes latency details, enabling you to monitor and fine-tune the performance of your solutions.
162
162
You can use the .NET SDK for Hadoop to examine the log files generated for the Azure Storage that holds the data for an HDInsight cluster.
163
163
164
164
### Control the size and number of backup indexes for old log files
165
165
166
166
To control the size and number of log files retained, set the following properties of the `RollingFileAppender`:
167
167
168
-
*`maxFileSize` is the critical size of the file, above which the file is rolled. The default value is 10 MB.
168
+
*`maxFileSize` is the critical size of the file, which the file is rolled. The default value is 10 MB.
169
169
*`maxBackupIndex` specifies the number of backup files to be created, default 1.
170
170
171
171
### Other log management techniques
172
172
173
-
To avoid running out of disk space, you can use some OS tools such as [logrotate](https://linux.die.net/man/8/logrotate) to manage handling of log files. You can configure `logrotate` to run on a daily basis, compressing log files and removing old ones. Your approach depends on your requirements, such as how long to keep the logfiles on local nodes.
173
+
To avoid running out of disk space, you can use some OS tools such as [logrotate](https://linux.die.net/man/8/logrotate) to manage to handle of log files. You can configure `logrotate` to run on a daily basis, compressing log files and removing old ones. Your approach depends on your requirements, such as how long to keep the logfiles on local nodes.
174
174
175
-
You can also check whether DEBUG logging is enabled for one or more services, which greatly increases the output log size.
175
+
You can also check whether DEBUG logging is enabled for one or more services, which greatly increase the output log size.
176
176
177
177
To collect the logs from all the nodes to one central location, you can create a data flow, such as ingesting all log entries into Solr.
178
178
179
179
## Next steps
180
180
181
181
*[Monitoring and Logging Practice for HDInsight](/previous-versions/msp-n-p/dn749790(v=pandp.10))
182
182
*[Access Apache Hadoop YARN application logs in Linux-based HDInsight](hdinsight-hadoop-access-yarn-app-logs-linux.md)
183
-
*[How to control size of log files for various Apache Hadoop components](https://community.hortonworks.com/articles/8882/how-to-control-size-of-log-files-for-various-hdp-c.html)
0 commit comments