Skip to content

Commit 75f9861

Browse files
committed
HIVE-29322: Avoid TopNKeyOperator When Map-Side LIMIT Pushdown Provides Better Pruning
1 parent 7a7596f commit 75f9861

File tree

225 files changed

+7800
-10792
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

225 files changed

+7800
-10792
lines changed

iceberg/iceberg-handler/src/test/results/positive/llap/iceberg_bucket_map_join_7.q.out

Lines changed: 167 additions & 183 deletions
Large diffs are not rendered by default.

ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,8 +66,10 @@ public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx,
6666
ReduceSinkOperator reduceSinkOperator = (ReduceSinkOperator) nd;
6767
ReduceSinkDesc reduceSinkDesc = reduceSinkOperator.getConf();
6868

69-
// Check whether the reduce sink operator contains top n
70-
if (reduceSinkDesc.getTopN() < 0 || !reduceSinkDesc.isOrdering()) {
69+
// HIVE-29322: Skip creating TopNKeyOperator when LIMIT pushdown has already applied (topN >= -1)
70+
// and the query uses a single reducer with no partition columns. In this scenario,
71+
// TopNKey offers no extra pruning benefit and only adds unnecessary processing overhead.
72+
if (reduceSinkDesc.getTopN() >= -1 || !reduceSinkDesc.isOrdering()) {
7173
return null;
7274
}
7375

ql/src/test/results/clientpositive/llap/autoColumnStats_4.q.out

Lines changed: 17 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -74,22 +74,17 @@ STAGE PLANS:
7474
Filter Operator
7575
predicate: cint is not null (type: boolean)
7676
Statistics: Num rows: 9173 Data size: 671202 Basic stats: COMPLETE Column stats: COMPLETE
77-
Top N Key Operator
78-
sort order: +
79-
keys: cint (type: int)
80-
null sort order: z
81-
Statistics: Num rows: 9173 Data size: 671202 Basic stats: COMPLETE Column stats: COMPLETE
82-
top n: 10
83-
Select Operator
84-
expressions: cint (type: int), CAST( cstring1 AS varchar(128)) (type: varchar(128))
85-
outputColumnNames: _col0, _col1
86-
Statistics: Num rows: 9173 Data size: 977184 Basic stats: COMPLETE Column stats: COMPLETE
87-
Reduce Output Operator
88-
key expressions: _col0 (type: int)
89-
null sort order: z
90-
sort order: +
91-
Statistics: Num rows: 9173 Data size: 977184 Basic stats: COMPLETE Column stats: COMPLETE
92-
value expressions: _col1 (type: varchar(128))
77+
Select Operator
78+
expressions: cint (type: int), CAST( cstring1 AS varchar(128)) (type: varchar(128))
79+
outputColumnNames: _col0, _col1
80+
Statistics: Num rows: 9173 Data size: 1479384 Basic stats: COMPLETE Column stats: COMPLETE
81+
Reduce Output Operator
82+
key expressions: _col0 (type: int)
83+
null sort order: z
84+
sort order: +
85+
Statistics: Num rows: 9173 Data size: 1479384 Basic stats: COMPLETE Column stats: COMPLETE
86+
TopN Hash Memory Usage: 0.1
87+
value expressions: _col1 (type: varchar(128))
9388
Execution mode: vectorized, llap
9489
LLAP IO: all inputs
9590
Reducer 2
@@ -98,27 +93,27 @@ STAGE PLANS:
9893
Select Operator
9994
expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: varchar(128))
10095
outputColumnNames: _col0, _col1
101-
Statistics: Num rows: 9173 Data size: 977184 Basic stats: COMPLETE Column stats: COMPLETE
96+
Statistics: Num rows: 9173 Data size: 1479384 Basic stats: COMPLETE Column stats: COMPLETE
10297
Limit
10398
Number of rows: 10
104-
Statistics: Num rows: 10 Data size: 1296 Basic stats: COMPLETE Column stats: COMPLETE
99+
Statistics: Num rows: 10 Data size: 1728 Basic stats: COMPLETE Column stats: COMPLETE
105100
Reduce Output Operator
106101
key expressions: _col0 (type: int)
107102
null sort order: a
108103
sort order: +
109104
Map-reduce partition columns: _col0 (type: int)
110-
Statistics: Num rows: 10 Data size: 1296 Basic stats: COMPLETE Column stats: COMPLETE
105+
Statistics: Num rows: 10 Data size: 1728 Basic stats: COMPLETE Column stats: COMPLETE
111106
value expressions: _col1 (type: varchar(128))
112107
Reducer 3
113108
Execution mode: vectorized, llap
114109
Reduce Operator Tree:
115110
Select Operator
116111
expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: varchar(128))
117112
outputColumnNames: _col0, _col1
118-
Statistics: Num rows: 10 Data size: 1296 Basic stats: COMPLETE Column stats: COMPLETE
113+
Statistics: Num rows: 10 Data size: 1728 Basic stats: COMPLETE Column stats: COMPLETE
119114
File Output Operator
120115
compressed: false
121-
Statistics: Num rows: 10 Data size: 1296 Basic stats: COMPLETE Column stats: COMPLETE
116+
Statistics: Num rows: 10 Data size: 1728 Basic stats: COMPLETE Column stats: COMPLETE
122117
table:
123118
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
124119
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
@@ -128,7 +123,7 @@ STAGE PLANS:
128123
Select Operator
129124
expressions: _col0 (type: int), _col1 (type: varchar(128))
130125
outputColumnNames: a, b
131-
Statistics: Num rows: 10 Data size: 1296 Basic stats: COMPLETE Column stats: COMPLETE
126+
Statistics: Num rows: 10 Data size: 1728 Basic stats: COMPLETE Column stats: COMPLETE
132127
Group By Operator
133128
aggregations: min(a), max(a), count(1), count(a), compute_bit_vector_hll(a), max(length(b)), avg(COALESCE(length(b),0)), count(b), compute_bit_vector_hll(b)
134129
minReductionHashAggr: 0.9

ql/src/test/results/clientpositive/llap/auto_join_without_localtask.q.out

Lines changed: 12 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -75,17 +75,12 @@ STAGE PLANS:
7575
1 _col0 (type: string)
7676
outputColumnNames: _col0, _col1
7777
Statistics: Num rows: 791 Data size: 140798 Basic stats: COMPLETE Column stats: COMPLETE
78-
Top N Key Operator
79-
sort order: ++
80-
keys: _col0 (type: string), _col1 (type: string)
78+
Reduce Output Operator
79+
key expressions: _col0 (type: string), _col1 (type: string)
8180
null sort order: zz
81+
sort order: ++
8282
Statistics: Num rows: 791 Data size: 140798 Basic stats: COMPLETE Column stats: COMPLETE
83-
top n: 40
84-
Reduce Output Operator
85-
key expressions: _col0 (type: string), _col1 (type: string)
86-
null sort order: zz
87-
sort order: ++
88-
Statistics: Num rows: 791 Data size: 140798 Basic stats: COMPLETE Column stats: COMPLETE
83+
TopN Hash Memory Usage: 0.1
8984
Reducer 3
9085
Execution mode: llap
9186
Reduce Operator Tree:
@@ -268,17 +263,12 @@ STAGE PLANS:
268263
1 _col0 (type: string)
269264
outputColumnNames: _col0, _col1
270265
Statistics: Num rows: 1288 Data size: 229264 Basic stats: COMPLETE Column stats: COMPLETE
271-
Top N Key Operator
272-
sort order: ++
273-
keys: _col0 (type: string), _col1 (type: string)
266+
Reduce Output Operator
267+
key expressions: _col0 (type: string), _col1 (type: string)
274268
null sort order: zz
269+
sort order: ++
275270
Statistics: Num rows: 1288 Data size: 229264 Basic stats: COMPLETE Column stats: COMPLETE
276-
top n: 40
277-
Reduce Output Operator
278-
key expressions: _col0 (type: string), _col1 (type: string)
279-
null sort order: zz
280-
sort order: ++
281-
Statistics: Num rows: 1288 Data size: 229264 Basic stats: COMPLETE Column stats: COMPLETE
271+
TopN Hash Memory Usage: 0.1
282272
Reducer 4
283273
Execution mode: llap
284274
Reduce Operator Tree:
@@ -461,17 +451,12 @@ STAGE PLANS:
461451
1 _col0 (type: string)
462452
outputColumnNames: _col0, _col1
463453
Statistics: Num rows: 270 Data size: 48060 Basic stats: COMPLETE Column stats: COMPLETE
464-
Top N Key Operator
465-
sort order: ++
466-
keys: _col0 (type: string), _col1 (type: string)
454+
Reduce Output Operator
455+
key expressions: _col0 (type: string), _col1 (type: string)
467456
null sort order: zz
457+
sort order: ++
468458
Statistics: Num rows: 270 Data size: 48060 Basic stats: COMPLETE Column stats: COMPLETE
469-
top n: 40
470-
Reduce Output Operator
471-
key expressions: _col0 (type: string), _col1 (type: string)
472-
null sort order: zz
473-
sort order: ++
474-
Statistics: Num rows: 270 Data size: 48060 Basic stats: COMPLETE Column stats: COMPLETE
459+
TopN Hash Memory Usage: 0.1
475460
Reducer 4
476461
Execution mode: llap
477462
Reduce Operator Tree:

0 commit comments

Comments
 (0)