Skip to content

Commit 181da7b

Browse files
Merge pull request #265977 from v-akarnase/patch-32
Update interactive-query-troubleshoot-slow-reducer.md
2 parents 3adf703 + 21a5e18 commit 181da7b

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

articles/hdinsight/interactive-query/interactive-query-troubleshoot-slow-reducer.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
22
title: Reducer is slow in Azure HDInsight
3-
description: Reducer is slow in Azure HDInsight from possible data skewing
3+
description: Reducer is slow in Azure HDInsight from possible data skewing.
44
ms.service: hdinsight
55
ms.topic: troubleshooting
6-
ms.date: 01/31/2023
6+
ms.date: 02/12/2024
77
---
88

99
# Scenario: Reducer is slow in Azure HDInsight
@@ -20,15 +20,15 @@ Open [beeline](../hadoop/apache-hadoop-use-hive-beeline.md) and verify the value
2020

2121
The value of this variable is meant to be set to true/false based on the nature of the data.
2222

23-
If the partitions in the input table are less(say less than 10), and so is the number of output partitions, and the variable is set to `true`, this causes data to be globally sorted and written using a single reducer per partition. Even if the number of reducers available is larger, a few reducers may be lagging behind due to data skew and the max parallelism cannot be attained. When changed to `false`, more than one reducer may handle a single partition and multiple smaller files will be written out, resulting in faster insert. This might affect further queries though because of the presence of smaller files.
23+
If the partitions in the input table are less(say less than 10), and so is the number of output partitions, and the variable is set to `true`, this causes data to be globally sorted and written using a single reducer per partition. Even if the number of reducers available is larger, a few reducers may be lagging behind due to data skew and the max parallelism can't be attained. When changed to `false`, more than one reducer may handle a single partition and multiple smaller files are written out, resulting in faster insert. This might affect further queries though because of the presence of smaller files.
2424

25-
A value of `true` makes sense when the number of partitions is larger and data is not skewed. In such cases the result of the map phase will be written out such that each partition will be handled by a single reducer resulting in better subsequent query performance.
25+
A value of `true` makes sense when the number of partitions is larger and data isn't skewed. In such cases the result of the map phase are written out such that each partition will be handled by a single reducer resulting in better subsequent query performance.
2626

2727
## Resolution
2828

2929
1. Try to repartition the data to normalize into multiple partitions.
3030

31-
1. If #1 is not possible, set the value of the config to false in beeline session and try the query again. `set hive.optimize.sort.dynamic.partition=false`. Setting the value to false at a cluster level is not recommended. The value of `true` is optimal and set the parameter as necessary based on nature of data and query.
31+
1. If #1 isn't possible, set the value of the config to false in beeline session and try the query again. `set hive.optimize.sort.dynamic.partition=false`. Setting the value to false at a cluster level is not recommended. The value of `true` is optimal and set the parameter as necessary based on nature of data and query.
3232

3333
## Next steps
3434

0 commit comments

Comments
 (0)