You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-hadoop-optimize-hive-query.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ description: This article describes how to optimize your Apache Hive queries in
4
4
ms.service: hdinsight
5
5
ms.topic: conceptual
6
6
ms.custom: hdinsightactive
7
-
ms.date: 04/29/2022
7
+
ms.date: 09/21/2022
8
8
---
9
9
10
10
# Optimize Apache Hive queries in Azure HDInsight
@@ -19,7 +19,7 @@ Choose the appropriate cluster type to help optimize performance for your worklo
19
19
20
20
* Choose **Interactive Query** cluster type to optimize for `ad hoc`, interactive queries.
21
21
* Choose Apache **Hadoop** cluster type to optimize for Hive queries used as a batch process.
22
-
***Spark** and **HBase** cluster types can also run Hive queries, and might be appropriate if you are running those workloads.
22
+
***Spark** and **HBase** cluster types can also run Hive queries, and might be appropriate if you're running those workloads.
23
23
24
24
For more information on running Hive queries on various HDInsight cluster types, see [What is Apache Hive and HiveQL on Azure HDInsight?](hadoop/hdinsight-use-hive.md).
25
25
@@ -41,7 +41,7 @@ For more information about scaling HDInsight, see [Scale HDInsight clusters](hdi
41
41
42
42
[Apache Tez](https://tez.apache.org/) is an alternative execution engine to the MapReduce engine. Linux-based HDInsight clusters have Tez enabled by default.
43
43
44
-
:::image type="content" source="./media/hdinsight-hadoop-optimize-hive-query/hdinsight-tez-engine.png" alt-text="HDInsight Apache Tez overview diagram":::
44
+
:::image type="content" source="./media/hdinsight-hadoop-optimize-hive-query/hdinsight-tez-engine-new.png" alt-text="HDInsight Apache Tez overview diagram":::
45
45
46
46
Tez is faster because:
47
47
@@ -71,7 +71,7 @@ Some partitioning considerations:
71
71
72
72
***Don't under partition** - Partitioning on columns with only a few values can cause few partitions. For example, partitioning on gender only creates two partitions to be created (male and female), so reduce the latency by a maximum of half.
73
73
***Don't over partition** - On the other extreme, creating a partition on a column with a unique value (for example, userid) causes multiple partitions. Over partition causes much stress on the cluster namenode as it has to handle the large number of directories.
74
-
***Avoid data skew** - Choose your partitioning key wisely so that all partitions are even size. For example, partitioning on *State* column may skew the distribution of data. Since the state of California has a population almost 30x that of Vermont, the partition size is potentially skewed and performance may vary tremendously.
74
+
***Avoid data skew** - Choose your partitioning key wisely so that all partitions are even size. For example, partitioning on *State* column may skew the distribution of data. Since the state of California has a population almost 30x that of Vermont, the partition size is potentially skewed, and performance may vary tremendously.
75
75
76
76
To create a partition table, use the *Partitioned By* clause:
0 commit comments