Skip to content

Commit f7239ec

Browse files
authored
Merge pull request #220748 from sreekzz/remove-hortonworks
Removed Hortonworks contents
2 parents 316314c + ba95f74 commit f7239ec

12 files changed

+58
-72
lines changed

articles/hdinsight/hadoop/apache-hadoop-on-premises-migration-best-practices-architecture.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ Some HDInsight Hive metastore best practices are as follows:
9999

100100
## Best practices for different workloads
101101

102-
- Consider using LLAP cluster for interactive Hive queries with improved response time [LLAP](https://cwiki.apache.org/confluence/display/Hive/LLAP) is a new feature in Hive 2.0 that allows in-memory caching of queries. LLAP makes Hive queries much faster, up to [26x faster than Hive 1.x in some cases](https://hortonworks.com/blog/announcing-apache-hive-2-1-25x-faster-queries-much/).
102+
- Consider using LLAP cluster for interactive Hive queries with improved response time [LLAP](https://cwiki.apache.org/confluence/display/Hive/LLAP) is a new feature in Hive 2.0 that allows in-memory caching of queries.
103103
- Consider using Spark jobs in place of Hive jobs.
104104
- Consider replacing impala-based queries with LLAP queries.
105105
- Consider replacing MapReduce jobs with Spark jobs.

articles/hdinsight/hadoop/hdinsight-use-hive.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Apache Hive is a data warehouse system for Apache Hadoop. You can q
44
ms.service: hdinsight
55
ms.topic: how-to
66
ms.custom: hdinsightactive,hdiseo17may2017
7-
ms.date: 03/31/2022
7+
ms.date: 12/09/2022
88
---
99

1010
# What is Apache Hive and HiveQL on Azure HDInsight?
@@ -28,11 +28,11 @@ Use the following table to discover the different ways to use Hive with HDInsigh
2828

2929
| **Use this method** if you want... | ...**interactive** queries | ...**batch** processing | ...from this **client operating system** |
3030
|:--- |:---:|:---:|:--- |:--- |
31-
| [HDInsight tools for Visual Studio Code](../hdinsight-for-vscode.md) ||| Linux, Unix, Mac OS X, or Windows |
31+
| [HDInsight tools for Visual Studio Code](../hdinsight-for-vscode.md) ||| Linux, Unix, macOS X, or Windows |
3232
| [HDInsight tools for Visual Studio](../hadoop/apache-hadoop-use-hive-visual-studio.md) |||Windows |
3333
| [Hive View](../hadoop/apache-hadoop-use-hive-ambari-view.md) |||Any (browser based) |
34-
| [Beeline client](../hadoop/apache-hadoop-use-hive-beeline.md) |||Linux, Unix, Mac OS X, or Windows |
35-
| [REST API](../hadoop/apache-hadoop-use-hive-curl.md) |  ||Linux, Unix, Mac OS X, or Windows |
34+
| [Beeline client](../hadoop/apache-hadoop-use-hive-beeline.md) |||Linux, Unix, macOS X, or Windows |
35+
| [REST API](../hadoop/apache-hadoop-use-hive-curl.md) |  ||Linux, Unix, macOS X, or Windows |
3636
| [Windows PowerShell](../hadoop/apache-hadoop-use-hive-powershell.md) |  ||Windows |
3737

3838
## HiveQL language reference
@@ -66,7 +66,7 @@ There are two types of tables that you can create with Hive:
6666

6767
* __Internal__: Data is stored in the Hive data warehouse. The data warehouse is located at `/hive/warehouse/` on the default storage for the cluster.
6868

69-
Use internal tables when one of the following conditions apply:
69+
Use internal tables when one of the following conditions applies:
7070

7171
* Data is temporary.
7272
* You want Hive to manage the lifecycle of the table and data.
@@ -173,7 +173,7 @@ These statements perform the following actions:
173173

174174
### Low Latency Analytical Processing (LLAP)
175175

176-
[LLAP](https://cwiki.apache.org/confluence/display/Hive/LLAP) (sometimes known as Live Long and Process) is a new feature in Hive 2.0 that allows in-memory caching of queries. LLAP makes Hive queries much faster, up to [26x faster than Hive 1.x in some cases](https://hortonworks.com/blog/announcing-apache-hive-2-1-25x-faster-queries-much/).
176+
[LLAP](https://cwiki.apache.org/confluence/display/Hive/LLAP) (sometimes known as Live Long and Process) is a new feature in Hive 2.0 that allows in-memory caching of queries.
177177

178178
HDInsight provides LLAP in the Interactive Query cluster type. For more information, see the [Start with Interactive Query](../interactive-query/apache-interactive-query-get-started.md) document.
179179

articles/hdinsight/hdinsight-hadoop-templeton-webhcat-debug-errors.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Learn how to about common errors returned by WebHCat on HDInsight a
44
ms.service: hdinsight
55
ms.topic: troubleshooting
66
ms.custom: hdinsightactive
7-
ms.date: 04/14/2020
7+
ms.date: 12/07/2022
88
---
99

1010
# Understand and resolve errors received from WebHCat on HDInsight
@@ -27,7 +27,7 @@ If the following default values are exceeded, it can degrade WebHCat performance
2727
| --- | --- | --- |
2828
| [yarn.scheduler.capacity.maximum-applications][maximum-applications] |The maximum number of jobs that can be active concurrently (pending or running) |10,000 |
2929
| [templeton.exec.max-procs][max-procs] |The maximum number of requests that can be served concurrently |20 |
30-
| [mapreduce.jobhistory.max-age-ms][max-age-ms] |The number of days that job history are retained |7 days |
30+
| [mapreduce.jobhistory.max-age-ms][max-age-ms] |The number of days that job history are retained |seven days |
3131

3232
## Too many requests
3333

@@ -45,13 +45,13 @@ If the following default values are exceeded, it can degrade WebHCat performance
4545
| --- | --- |
4646
| This status code usually occurs during failover between the primary and secondary HeadNode for the cluster |Wait two minutes, then retry the operation |
4747

48-
## Bad request Content: Could not find job
48+
## Bad request Content: Couldn't find job
4949

5050
**HTTP Status code**: 400
5151

5252
| Cause | Resolution |
5353
| --- | --- |
54-
| Job details have been cleaned up by the job history cleaner |The default retention period for job history is 7 days. The default retention period can be changed by modifying `mapreduce.jobhistory.max-age-ms`. For more information, see [Modifying configuration](#modifying-configuration) |
54+
| Job details have been cleaned up by the job history cleaner |The default retention period for job history is seven days. The default retention period can be changed by modifying `mapreduce.jobhistory.max-age-ms`. For more information, see [Modifying configuration](#modifying-configuration) |
5555
| Job has been killed because of a failover |Retry job submission for up to two minutes |
5656
| An Invalid job ID was used |Check if the job ID is correct |
5757

@@ -62,7 +62,7 @@ If the following default values are exceeded, it can degrade WebHCat performance
6262
| Cause | Resolution |
6363
| --- | --- |
6464
| Internal garbage collection is occurring within the WebHCat process |Wait for garbage collection to finish or restart the WebHCat service |
65-
| Time out waiting on a response from the ResourceManager service. This error can occur when the number of active applications goes the configured maximum (default 10,000) |Wait for currently running jobs to complete or increase the concurrent job limit by modifying `yarn.scheduler.capacity.maximum-applications`. For more information, see the [Modifying configuration](#modifying-configuration) section. |
65+
| Time out waiting on a response from the Resource Manager service. This error can occur when the number of active applications goes the configured maximum (default 10,000) |Wait for currently running jobs to complete or increase the concurrent job limit by modifying `yarn.scheduler.capacity.maximum-applications`. For more information, see the [Modifying configuration](#modifying-configuration) section. |
6666
| Attempting to retrieve all jobs through the [GET /jobs](https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Jobs) call while `Fields` is set to `*` |Don't retrieve *all* job details. Instead use `jobid` to retrieve details for jobs only greater than certain job ID. Or, don't use `Fields` |
6767
| The WebHCat service is down during HeadNode failover |Wait for two minutes and retry the operation |
6868
| There are more than 500 pending jobs submitted through WebHCat |Wait until currently pending jobs have completed before submitting more jobs |
@@ -71,6 +71,4 @@ If the following default values are exceeded, it can degrade WebHCat performance
7171

7272
[!INCLUDE [troubleshooting next steps](includes/hdinsight-troubleshooting-next-steps.md)]
7373

74-
[maximum-applications]: https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.1.3/bk_system-admin-guide/content/setting_application_limits.html
7574
[max-procs]: https://cwiki.apache.org/confluence/display/Hive/WebHCat+Configure#WebHCatConfigure-WebHCatConfiguration
76-
[max-age-ms]: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/ds_Hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

articles/hdinsight/hdinsight-log-management.md

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@ description: Determine the types, sizes, and retention policies for HDInsight ac
44
ms.service: hdinsight
55
ms.topic: how-to
66
ms.custom: hdinsightactive
7-
ms.date: 04/28/2022
7+
ms.date: 12/07/2022
88
---
99

1010
# Manage logs for an HDInsight cluster
1111

12-
An HDInsight cluster produces a variety of log files. For example, Apache Hadoop and related services, such as Apache Spark, produce detailed job execution logs. Log file management is part of maintaining a healthy HDInsight cluster. There can also be regulatory requirements for log archiving. Due to the number and size of log files, optimizing log storage and archiving helps with service cost management.
12+
An HDInsight cluster produces variois log files. For example, Apache Hadoop and related services, such as Apache Spark, produce detailed job execution logs. Log file management is part of maintaining a healthy HDInsight cluster. There can also be regulatory requirements for log archiving. Due to the number and size of log files, optimizing log storage and archiving helps with service cost management.
1313

1414
Managing HDInsight cluster logs includes retaining information about all aspects of the cluster environment. This information includes all associated Azure Service logs, cluster configuration, job execution information, any error states, and other data as needed.
1515

@@ -57,7 +57,7 @@ It's important to understand the workload types running on your HDInsight cluste
5757

5858
* Consider maintaining data lineage tracking by adding an identifier to each log entry, or through other techniques. This allows you to trace back the original source of the data and the operation, and follow the data through each stage to understand its consistency and validity.
5959

60-
* Consider how you can collect logs from the cluster, or from more than one cluster, and collate them for purposes such as auditing, monitoring, planning, and alerting. You might use a custom solution to access and download the log files on a regular basis, and combine and analyze them to provide a dashboard display. You can also add additional capabilities for alerting for security or failure detection. You can build these utilities using PowerShell, the HDInsight SDKs, or code that accesses the Azure classic deployment model.
60+
* Consider how you can collect logs from the cluster, or from more than one cluster, and collate them for purposes such as auditing, monitoring, planning, and alerting. You might use a custom solution to access and download the log files regularly, and combine and analyze them to provide a dashboard display. You can also add other capabilities for alerting for security or failure detection. You can build these utilities using PowerShell, the HDInsight SDKs, or code that accesses the Azure classic deployment model.
6161

6262
* Consider whether a monitoring solution or service would be a useful benefit. The Microsoft System Center provides an [HDInsight management pack](https://systemcenter.wiki/?Get_ManagementPackBundle=Microsoft.HDInsight.mpb&FileMD5=10C7D975C6096FFAA22C84626D211259). You can also use third-party tools such as Apache Chukwa and Ganglia to collect and centralize logs. Many companies offer services to monitor Hadoop-based big data solutions, for example: Centerity, Compuware APM, Sematext SPM, and Zettaset Orchestrator.
6363

@@ -79,7 +79,7 @@ Using the Ambari UI, you can download the configuration for any (or all) service
7979

8080
### View the script action logs
8181

82-
HDInsight [script actions](hdinsight-hadoop-customize-cluster-linux.md) run scripts on a cluster, either manually or when specified. For example, script actions can be used to install additional software on the cluster or to alter configuration settings from the default values. Script action logs can provide insight into errors that occurred during setup of the cluster, and also configuration settings' changes that could affect cluster performance and availability. To see the status of a script action, select the **ops** button on your Ambari UI, or access the status logs in the default storage account. The storage logs are available at `/STORAGE_ACCOUNT_NAME/DEFAULT_CONTAINER_NAME/custom-scriptaction-logs/CLUSTER_NAME/DATE`.
82+
HDInsight [script actions](hdinsight-hadoop-customize-cluster-linux.md) run scripts on a cluster, either manually or when specified. For example, script actions can be used to install other software on the cluster or to alter configuration settings from the default values. Script action logs can provide insight into errors that occurred during setup of the cluster, and also configuration settings' changes that could affect cluster performance and availability. To see the status of a script action, select the **ops** button on your Ambari UI, or access the status logs in the default storage account. The storage logs are available at `/STORAGE_ACCOUNT_NAME/DEFAULT_CONTAINER_NAME/custom-scriptaction-logs/CLUSTER_NAME/DATE`.
8383

8484
### View Ambari alerts status logs
8585

@@ -130,13 +130,13 @@ yarn logs -applicationId <applicationId> -appOwner <user-who-started-the-applica
130130
yarn logs -applicationId <applicationId> -appOwner <user-who-started-the-application> -containerId <containerId> -nodeAddress <worker-node-address>
131131
```
132132

133-
#### YARN ResourceManager UI
133+
#### YARN Resource Manager UI
134134

135-
The YARN ResourceManager UI runs on the cluster head node, and is accessed through the Ambari web UI. Use the following steps to view the YARN logs:
135+
The YARN Resource Manager UI runs on the cluster head node, and is accessed through the Ambari web UI. Use the following steps to view the YARN logs:
136136

137137
1. In a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net`. Replace CLUSTERNAME with the name of your HDInsight cluster.
138138
2. From the list of services on the left, select YARN.
139-
3. From the Quick Links dropdown, select one of the cluster head nodes and then select **ResourceManager logs**. You're presented with a list of links to YARN logs.
139+
3. From the Quick Links dropdown, select one of the cluster head nodes and then select **Resource Manager logs**. You're presented with a list of links to YARN logs.
140140

141141
## Step 4: Forecast log volume storage sizes and costs
142142

@@ -158,26 +158,25 @@ Alternatively, you can script log archiving with PowerShell. For an example Pow
158158

159159
### Accessing Azure Storage metrics
160160

161-
Azure Storage can be configured to log storage operations and access. You can use these very detailed logs for capacity monitoring and planning, and for auditing requests to storage. The logged information includes latency details, enabling you to monitor and fine-tune the performance of your solutions.
161+
Azure Storage can be configured to log storage operations and access. You can use these detailed logs for capacity monitoring and planning, and for auditing requests to storage. The logged information includes latency details, enabling you to monitor and fine-tune the performance of your solutions.
162162
You can use the .NET SDK for Hadoop to examine the log files generated for the Azure Storage that holds the data for an HDInsight cluster.
163163

164164
### Control the size and number of backup indexes for old log files
165165

166166
To control the size and number of log files retained, set the following properties of the `RollingFileAppender`:
167167

168-
* `maxFileSize` is the critical size of the file, above which the file is rolled. The default value is 10 MB.
168+
* `maxFileSize` is the critical size of the file, which the file is rolled. The default value is 10 MB.
169169
* `maxBackupIndex` specifies the number of backup files to be created, default 1.
170170

171171
### Other log management techniques
172172

173-
To avoid running out of disk space, you can use some OS tools such as [logrotate](https://linux.die.net/man/8/logrotate) to manage handling of log files. You can configure `logrotate` to run on a daily basis, compressing log files and removing old ones. Your approach depends on your requirements, such as how long to keep the logfiles on local nodes.
173+
To avoid running out of disk space, you can use some OS tools such as [logrotate](https://linux.die.net/man/8/logrotate) to manage to handle of log files. You can configure `logrotate` to run on a daily basis, compressing log files and removing old ones. Your approach depends on your requirements, such as how long to keep the logfiles on local nodes.
174174

175-
You can also check whether DEBUG logging is enabled for one or more services, which greatly increases the output log size.
175+
You can also check whether DEBUG logging is enabled for one or more services, which greatly increase the output log size.
176176

177177
To collect the logs from all the nodes to one central location, you can create a data flow, such as ingesting all log entries into Solr.
178178

179179
## Next steps
180180

181181
* [Monitoring and Logging Practice for HDInsight](/previous-versions/msp-n-p/dn749790(v=pandp.10))
182182
* [Access Apache Hadoop YARN application logs in Linux-based HDInsight](hdinsight-hadoop-access-yarn-app-logs-linux.md)
183-
* [How to control size of log files for various Apache Hadoop components](https://community.hortonworks.com/articles/8882/how-to-control-size-of-log-files-for-various-hdp-c.html)

0 commit comments

Comments
 (0)