Merge pull request #265977 from v-akarnase/patch-32

prmerger-automator[bot] · web-flow · commit 181da7b2b54f · 2024-02-13T03:32:51.000Z
Update interactive-query-troubleshoot-slow-reducer.md
diff --git a/articles/hdinsight/interactive-query/interactive-query-troubleshoot-slow-reducer.md b/articles/hdinsight/interactive-query/interactive-query-troubleshoot-slow-reducer.md
@@ -1,9 +1,9 @@
 ---
 title: Reducer is slow in Azure HDInsight
-description: Reducer is slow in Azure HDInsight from possible data skewing
+description: Reducer is slow in Azure HDInsight from possible data skewing.
 ms.service: hdinsight
 ms.topic: troubleshooting
-ms.date: 01/31/2023
+ms.date: 02/12/2024
 ---
 
 # Scenario: Reducer is slow in Azure HDInsight
@@ -20,15 +20,15 @@ Open [beeline](../hadoop/apache-hadoop-use-hive-beeline.md) and verify the value
 
 The value of this variable is meant to be set to true/false based on the nature of the data.
 
-If the partitions in the input table are less(say less than 10), and so is the number of output partitions, and the variable is set to `true`, this causes data to be globally sorted and written using a single reducer per partition. Even if the number of reducers available is larger, a few reducers may be lagging behind due to data skew and the max parallelism cannot be attained. When changed to `false`, more than one reducer may handle a single partition and multiple smaller files will be written out, resulting in faster insert. This might affect further queries though because of the presence of smaller files.
+If the partitions in the input table are less(say less than 10), and so is the number of output partitions, and the variable is set to `true`, this causes data to be globally sorted and written using a single reducer per partition. Even if the number of reducers available is larger, a few reducers may be lagging behind due to data skew and the max parallelism can't be attained. When changed to `false`, more than one reducer may handle a single partition and multiple smaller files are written out, resulting in faster insert. This might affect further queries though because of the presence of smaller files.
 
-A value of `true` makes sense when the number of partitions is larger and data is not skewed. In such cases the result of the map phase will be written out such that each partition will be handled by a single reducer resulting in better subsequent query performance.
+A value of `true` makes sense when the number of partitions is larger and data isn't skewed. In such cases the result of the map phase are written out such that each partition will be handled by a single reducer resulting in better subsequent query performance.
 
 ## Resolution
 
 1. Try to repartition the data to normalize into multiple partitions.
 
-1. If #1 is not possible, set the value of the config to false in beeline session and try the query again. `set hive.optimize.sort.dynamic.partition=false`. Setting the value to false at a cluster level is not recommended. The value of `true` is optimal and set the parameter as necessary based on nature of data and query.
+1. If #1 isn't possible, set the value of the config to false in beeline session and try the query again. `set hive.optimize.sort.dynamic.partition=false`. Setting the value to false at a cluster level is not recommended. The value of `true` is optimal and set the parameter as necessary based on nature of data and query.
 
 ## Next steps