Skip to content

Commit 496cbf6

Browse files
authored
Merge pull request #211992 from sreekzz/patch-110
Change Diagram without Storm block
2 parents 3f209a7 + 551b7a5 commit 496cbf6

File tree

3 files changed

+4
-4
lines changed

3 files changed

+4
-4
lines changed

articles/hdinsight/hdinsight-hadoop-optimize-hive-query.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: This article describes how to optimize your Apache Hive queries in
44
ms.service: hdinsight
55
ms.topic: conceptual
66
ms.custom: hdinsightactive
7-
ms.date: 04/29/2022
7+
ms.date: 09/21/2022
88
---
99

1010
# Optimize Apache Hive queries in Azure HDInsight
@@ -19,7 +19,7 @@ Choose the appropriate cluster type to help optimize performance for your worklo
1919

2020
* Choose **Interactive Query** cluster type to optimize for `ad hoc`, interactive queries.
2121
* Choose Apache **Hadoop** cluster type to optimize for Hive queries used as a batch process.
22-
* **Spark** and **HBase** cluster types can also run Hive queries, and might be appropriate if you are running those workloads.
22+
* **Spark** and **HBase** cluster types can also run Hive queries, and might be appropriate if you're running those workloads.
2323

2424
For more information on running Hive queries on various HDInsight cluster types, see [What is Apache Hive and HiveQL on Azure HDInsight?](hadoop/hdinsight-use-hive.md).
2525

@@ -41,7 +41,7 @@ For more information about scaling HDInsight, see [Scale HDInsight clusters](hdi
4141

4242
[Apache Tez](https://tez.apache.org/) is an alternative execution engine to the MapReduce engine. Linux-based HDInsight clusters have Tez enabled by default.
4343

44-
:::image type="content" source="./media/hdinsight-hadoop-optimize-hive-query/hdinsight-tez-engine.png" alt-text="HDInsight Apache Tez overview diagram":::
44+
:::image type="content" source="./media/hdinsight-hadoop-optimize-hive-query/hdinsight-tez-engine-new.png" alt-text="HDInsight Apache Tez overview diagram":::
4545

4646
Tez is faster because:
4747

@@ -71,7 +71,7 @@ Some partitioning considerations:
7171

7272
* **Don't under partition** - Partitioning on columns with only a few values can cause few partitions. For example, partitioning on gender only creates two partitions to be created (male and female), so reduce the latency by a maximum of half.
7373
* **Don't over partition** - On the other extreme, creating a partition on a column with a unique value (for example, userid) causes multiple partitions. Over partition causes much stress on the cluster namenode as it has to handle the large number of directories.
74-
* **Avoid data skew** - Choose your partitioning key wisely so that all partitions are even size. For example, partitioning on *State* column may skew the distribution of data. Since the state of California has a population almost 30x that of Vermont, the partition size is potentially skewed and performance may vary tremendously.
74+
* **Avoid data skew** - Choose your partitioning key wisely so that all partitions are even size. For example, partitioning on *State* column may skew the distribution of data. Since the state of California has a population almost 30x that of Vermont, the partition size is potentially skewed, and performance may vary tremendously.
7575

7676
To create a partition table, use the *Partitioned By* clause:
7777

57.6 KB
Loading

0 commit comments

Comments
 (0)