Skip to content

Commit 3f3caad

Browse files
author
Sreekanth Iyer (Ushta Te Consultancy Services)
committed
Improved Corectness Score
1 parent 87d35c9 commit 3f3caad

5 files changed

+10
-10
lines changed

articles/hdinsight/hdinsight-hadoop-hive-out-of-memory-error-oom.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ The **hive.auto.convert.join.noconditionaltask** in the hive-site.xml file was s
9797
</property>
9898
```
9999

100-
It's likely map join was the cause of the Java Heap Space out of memory error. As explained in the blog post [Hadoop Yarn memory settings in HDInsight](/archive/blogs/shanyu/hadoop-yarn-memory-settings-in-hdinsight), when Tez execution engine is used the heap space used actually belongs to the Tez container. See the following image describing the Tez container memory.
100+
It's likely map join was the cause of the Java Heap Space out of memory error. As explained in the blog post [Hadoop Yarn memory settings in HDInsight](/archive/blogs/shanyu/hadoop-yarn-memory-settings-in-hdinsight), when Tez execution engine used the heap space used actually belongs to the Tez container. See the following image describing the Tez container memory.
101101

102102
:::image type="content" source="./media/hdinsight-hadoop-hive-out-of-memory-error-oom/hive-out-of-memory-error-oom-tez-container-memory.png" alt-text="Tez container memory diagram: Hive out of memory error." border="false":::
103103

articles/hdinsight/hdinsight-streaming-at-scale-overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ For more information, see [What is Apache Spark Streaming?](./spark/apache-spark
3131

3232
Although you can specify the number of nodes in your cluster during creation, you may want to grow or shrink the cluster to match the workload. All HDInsight clusters allow you to [change the number of nodes in the cluster](hdinsight-administer-use-portal-linux.md#scale-clusters). Spark clusters can be dropped with no loss of data, as all data is stored in Azure Storage or Data Lake Storage.
3333

34-
There are advantages to decoupling technologies. For instance, Kafka is an event buffering technology, so its very IO intensive and doesn't need much processing power. In comparison, stream processors such as Spark Streaming are compute-intensive, requiring more powerful VMs. By having these technologies decoupled into different clusters, you can scale them independently while best utilizing the VMs.
34+
There are advantages to decoupling technologies. For instance, Kafka is an event buffering technology, so it's very IO intensive and doesn't need much processing power. In comparison, stream processors such as Spark Streaming are compute-intensive, requiring more powerful VMs. By having these technologies decoupled into different clusters, you can scale them independently while best utilizing the VMs.
3535

3636
### Scale the stream buffering layer
3737

articles/hdinsight/hdinsight-use-oozie-linux-mac.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ The workflow used in this document contains two actions. Actions are definitions
3737

3838
:::image type="content" source="./media/hdinsight-use-oozie-linux-mac/oozie-workflow-diagram.png" alt-text="HDInsight oozie workflow diagram." border="false":::
3939

40-
1. A Hive action runs an HiveQL script to extract records from the `hivesampletable` that's included with HDInsight. Each row of data describes a visit from a specific mobile device. The record format appears like the following text:
40+
1. A Hive action runs a HiveQL script to extract records from the `hivesampletable` that's included with HDInsight. Each row of data describes a visit from a specific mobile device. The record format appears like the following text:
4141

4242
```output
4343
8 18:54:20 en-US Android Samsung SCH-i500 California United States 13.9204007 0 0
@@ -201,7 +201,7 @@ Oozie workflow definitions are written in Hadoop Process Definition Language (hP
201201

202202
* `RunHiveScript`: This action is the start action and runs the `useooziewf.hql` Hive script.
203203

204-
* `RunSqoopExport`: This action exports the data created from the Hive script to a SQL database by using Sqoop. This action only runs if the `RunHiveScript` action is successful.
204+
* `RunSqoopExport`: This action exports the data created from the Hive script to an SQL database by using Sqoop. This action only runs if the `RunHiveScript` action is successful.
205205

206206
The workflow has several entries, such as `${jobTracker}`. You'll replace these entries with the values you use in the job definition. You'll create the job definition later in this document.
207207

@@ -523,7 +523,7 @@ To access the Oozie web UI, complete the following steps:
523523

524524
6. From the **Job Info** tab, you can see the basic job information and the individual actions within the job. You can use the tabs at the top to view the **Job Definition**, **Job Configuration**, access the **Job Log**, or view a directed acyclic graph (DAG) of the job under **Job DAG**.
525525

526-
* **Job Log**: Select the **Get Logs** button to get all logs for the job, or use the **Enter Search Filter** field to filter the logs.
526+
* **Job Log**: Select the **Get Logs** button to get all logs for the job, or use the `Enter Search Filter` field to filter the logs.
527527

528528
:::image type="content" source="./media/hdinsight-use-oozie-linux-mac/hdinsight-oozie-job-log.png" alt-text="HDInsight Apache Oozie job log." border="true":::
529529

articles/hdinsight/interactive-query/apache-hive-replication.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,11 @@ ms.date: 06/14/2024
1010

1111
In the context of databases and warehouses, replication is the process of duplicating entities from one warehouse to another. Duplication can apply to an entire database or to a smaller level, such as a table or partition. The objective is to have a replica that changes whenever the base entity changes. Replication on Apache Hive focuses on disaster recovery and offers unidirectional primary-copy replication. In HDInsight clusters, Hive Replication can be used to unidirectionally replicate the Hive metastore and the associated underlying data lake on Azure Data Lake Storage Gen2.
1212

13-
Hive Replication has evolved over the years with newer versions providing better functionality and being faster and less resource intensive. In this article, we discuss Hive Replication (Replv2) which is supported in both HDInsight 3.6 and HDInsight 4.0 cluster types.
13+
Hive Replication has evolved over the years with newer versions providing better functionality and being faster and less resource intensive. In this article, we discuss Hive Replication `(Replv2)` which is supported in both HDInsight 3.6 and HDInsight 4.0 cluster types.
1414

15-
## Advantages of replv2
15+
## Advantages of `replv2`
1616

17-
[Hive ReplicationV2](https://cwiki.apache.org/confluence/display/Hive/HiveReplicationv2Development) (also called Replv2) has the following advantages over the first version of Hive replication that used Hive [IMPORT-EXPORT](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport):
17+
[Hive ReplicationV2](https://cwiki.apache.org/confluence/display/Hive/HiveReplicationv2Development) (also called `Replv2`) has the following advantages over the first version of Hive replication that used Hive [IMPORT-EXPORT](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport):
1818

1919
- Event-based incremental replication
2020
- Point-in-time replication
@@ -74,7 +74,7 @@ repl load tpcds_orc from '/tmp/hive/repl/38896729-67d5-41b2-90dc-46eeed4c5dd0';
7474

7575
### Output the last replicated event ID
7676

77-
The `REPL STATUS [database name]` command is executed on target clusters and outputs the last replicated `event_id`. The command also enables users to know what state their target cluster is been replicated to. You can use the output of this command to construct the next `REPL DUMP` command for incremental replication.
77+
The `REPL STATUS [database name]` command is executed on target clusters and outputs the last replicated `event_id`. The command also enables users to know what state their target cluster replicated to. You can use the output of this command to construct the next `REPL DUMP` command for incremental replication.
7878

7979
```sql
8080
repl status tpcds_orc;

articles/hdinsight/interactive-query/llap-schedule-based-autoscale-best-practices.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.author: sairamyeturi
88
ms.date: 06/14/2024
99
---
1010

11-
# Azure HDInsight interactive query cluster (Hive LLAP) schedule based autoscale
11+
# Azure HDInsight interactive query cluster (Hive LLAP) `schedule based autoscale`
1212

1313
This document provides the onboarding steps to enable schedule-based autoscale for Interactive Query (LLAP) Cluster type in Azure HDInsight. It includes some of the best practices to operate Autoscale in Hive-LLAP.
1414

0 commit comments

Comments
 (0)