Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 47 additions & 1 deletion ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,11 @@
import org.apache.hadoop.hive.ql.exec.FunctionRegistry;
import org.apache.hadoop.hive.ql.exec.GroupByOperator;
import org.apache.hadoop.hive.ql.exec.JoinOperator;
import org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator;
import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
import org.apache.hadoop.hive.ql.exec.Operator;
import org.apache.hadoop.hive.ql.exec.OperatorUtils;
import org.apache.hadoop.hive.ql.exec.PTFOperator;
import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
import org.apache.hadoop.hive.ql.exec.SelectOperator;
import org.apache.hadoop.hive.ql.exec.TableScanOperator;
Expand Down Expand Up @@ -1300,9 +1302,10 @@ private static void runTopNKeyOptimization(OptimizeTezProcContext procCtx)
return;
}

String topNKeyRegexPattern = buildTopNKeyRegexPattern(procCtx);
Map<SemanticRule, SemanticNodeProcessor> opRules = new LinkedHashMap<SemanticRule, SemanticNodeProcessor>();
opRules.put(
new RuleRegExp("Top n key optimization", ReduceSinkOperator.getOperatorName() + "%"),
new RuleRegExp("Top n key optimization", topNKeyRegexPattern),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slightly suppose this level of complicated bailout should happen in TopNKeyProcessor. Most likely, can we skip adding a TopNKeyOperator when the RSO is not a PTFReduceSink and RSO's ancestors don't include RSO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree on this. @zabetak please provide your view on this. Thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in https://github.com/apache/hive/pull/6202/changes#r2675925598 before talking about how to implement a change we need to understand what change we need to perform and if the change is needed.

The question if we should add or not a TopNKeyOperator below a windowing (PTF) ReduceSink remains open. Let's finalize the discussion in https://github.com/apache/hive/pull/6202/changes#r2668127719 and then we can can come back to this.

new TopNKeyProcessor(
HiveConf.getIntVar(procCtx.conf, HiveConf.ConfVars.HIVE_MAX_TOPN_ALLOWED),
HiveConf.getFloatVar(procCtx.conf, ConfVars.HIVE_TOPN_EFFICIENCY_THRESHOLD),
Expand All @@ -1322,6 +1325,49 @@ private static void runTopNKeyOptimization(OptimizeTezProcContext procCtx)
ogw.startWalking(topNodes, null);
}

/*
* Build the ReduceSink matching pattern used by TopNKey optimization.
*
* For ORDER BY / LIMIT queries that do not involve GROUP BY or JOIN,
* applying TopNKey results in a performance regression. ReduceSink
* operators created only for ordering must therefore be excluded from
* TopNKey.
*
* When ORDER BY or LIMIT is present, restrict TopNKey to ReduceSink
* operators that originate from GROUP BY, JOIN, MAPJOIN, LATERAL VIEW
* JOIN or PTF query shapes
*/
private static String buildTopNKeyRegexPattern(OptimizeTezProcContext procCtx) {
String reduceSinkOp = ReduceSinkOperator.getOperatorName() + "%";

boolean hasOrderOrLimit =
procCtx.parseContext.getQueryProperties().hasLimit() ||
procCtx.parseContext.getQueryProperties().hasOrderBy();
Comment on lines +1343 to +1345
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? It is usually better if we can keep the optimization/transformation rules independent of the SQL syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is added for windowing queries to use topNkey Path - without group by / join in the query.
example: windowing_streaming.q
select * from ( select p_mfgr, rank() over(partition by p_mfgr order by p_name) r from part) a where r < 4

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select * 
from ( select p_mfgr, rank() over(partition by p_mfgr order by p_name) r from part) a 
where r < 4;

Should such queries use the Top N Key Operator?

Plan A: With Top N Key Operator

        Map 1 
            Map Operator Tree:
                TableScan
                  alias: part
                  Statistics: Num rows: 26 Data size: 5694 Basic stats: COMPLETE Column stats: COMPLETE
                  Top N Key Operator
                    sort order: ++
                    keys: p_mfgr (type: string), p_name (type: string)
                    null sort order: az
                    Map-reduce partition columns: p_mfgr (type: string)
                    Statistics: Num rows: 26 Data size: 5694 Basic stats: COMPLETE Column stats: COMPLETE
                    top n: 4
                    Reduce Output Operator
                      key expressions: p_mfgr (type: string), p_name (type: string)
                      null sort order: az
                      sort order: ++
                      Map-reduce partition columns: p_mfgr (type: string)
                      Statistics: Num rows: 26 Data size: 5694 Basic stats: COMPLETE Column stats: COMPLETE

Plan B: Without Top N Key Operator

        Map 1 
            Map Operator Tree:
                TableScan
                  alias: part
                  Statistics: Num rows: 26 Data size: 5694 Basic stats: COMPLETE Column stats: COMPLETE
                  Reduce Output Operator
                    key expressions: p_mfgr (type: string), p_name (type: string)
                    null sort order: az
                    sort order: ++
                    Map-reduce partition columns: p_mfgr (type: string)
                    Statistics: Num rows: 26 Data size: 5694 Basic stats: COMPLETE Column stats: COMPLETE
                    TopN Hash Memory Usage: 0.8

The plan structure is almost identical to the case of ORDER BY + LIMIT queries so from the discussion so far, I was under the impression that "Plan B" is better and more efficient in most cases.

Copy link
Contributor Author

@Indhumathi27 Indhumathi27 Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have debugged Windowing queries case and below is the observation.

Testcase (unsorted data):
`CREATE TABLE topnkey_windowing (tw_code string, tw_value double);

INSERT INTO topnkey_windowing VALUES (NULL, NULL),(NULL, 109),('A', 109),('A', 104),('A', 110),('A', 120),('A', 103),('A', 109),('B', 105),('B', 106),('B', 106),('B', NULL),('B', 106),('A', 109);

SELECT tw_code, ranking
FROM (
SELECT tw_code AS tw_code,
rank() OVER (PARTITION BY tw_code ORDER BY tw_value) AS ranking
FROM topnkey_windowing) tmp1
WHERE ranking < 2;
`
With TopNkey enabled,
Map phase: Input records: 14 Output Records: 9
With TopNkey disabled,
Map phase: Input records: 14 Output Records: 8
Time Taken : both almost same.

  1. In PTF queries, TopNKey creates a separate TopNKeyFilter for every distinct PARTITION BY key and maintains an in-memory Top-N heap per partition.
  2. Each incoming row performs partition-key hashing, map lookup, and heap comparison to decide whether it belongs to that partition’s Top-N.
  3. Rows are eliminated when their sort key compares worse than the current Top-N boundary, so they are not inserted into the partition’s ordered Top-N set.
  4. For ORDER BY … LIMIT queries, TopNKey maintains only a single global Top-N heap per reducer.

I have tested with low-cardinality, monotonic-ordered windowing dataset and high-cardinality, multi-row-per-partition PTF test dataset. In this case, behaviour is similar to ORDER by.. Limit queries, where all the rows are forwarded. But query performance degradation is not observed for PTF operator case, comparing with disabling topnKeyoperator.
One such example:
ptf_testcase.txt

From this experiments, with TopNkey enabled / disabled, performance is almost similar for Windowing queries.
Disabling TopNkey for Windowing queries don't show a drastic difference in performance. Forwarding all rows in PTF TopNKey does not cause the catastrophic shuffle explosion seen in ORDER BY … LIMIT queries, because the shuffle and reducer stages are already needed for window partitioning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the additional experiments for the PTF case.

But query performance degradation is not observed for PTF operator case, comparing with disabling topnKeyoperator

It's difficult to extract safe conclusions from the comparison between ORDER BY and windowing experiments cause dataset size and effective limit differ significantly.

Dataset size:

  • for ORDER BY the dataset has ~10M rows
  • for windowing the dataset has 50K rows

Effective limit/top-n filter:

  • for ORDER BY the limit is 100
  • for windowing the limit is ~6K

It would be great if you can run some experiments where the numbers are closer.

Disabling TopNkey for Windowing queries don't show a drastic difference in performance

I still believe that this depends on the use-case. The example that you crafted for ORDER BY was clearly showing the downsides of the Top-N operator. The answer/benchmarks above imply that this will never happen for windowing functions but I don't understand why.

because the shuffle and reducer stages are already needed for window partitioning.

I don't fully understand the statement about the shuffle and reducer stages. Can you elaborate a bit more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you can run some experiments where the numbers are closer.

Below are the number's with more data.
Total rows: 51200000
Dataset: Mixed Partition data
Table schema and query: ptf_testcase.txt

With TopNkey enabled:
Screenshot 2026-01-16 at 9 14 54 AM

With TopNkey disabled:
Screenshot 2026-01-16 at 9 13 50 AM

I don't fully understand the statement about the shuffle and reducer stages. Can you elaborate a bit more?

For PTF (windowing) queries, the shuffle and reducer stages are required to group rows by the PARTITION BY key before window functions can be evaluated. TopNKey operates per partition and, even when it forwards most or all rows, it does not introduce additional global shuffle or change the reducer fan-in.

In contrast, for ORDER BY … LIMIT queries, TopNKey maintains a single global Top-N heap and can disable ReduceSink-level Top-N pruning; when input data is unsorted, this causes all rows to be shuffled globally, leading to severe performance degradation.


if (hasPTFReduceSink(procCtx) || !hasOrderOrLimit) {
return reduceSinkOp;
}

return "("
+ GroupByOperator.getOperatorName() + "|"
+ PTFOperator.getOperatorName() + "|"
+ JoinOperator.getOperatorName() + "|"
+ MapJoinOperator.getOperatorName() + "|"
+ LateralViewJoinOperator.getOperatorName() + "|"
+ CommonMergeJoinOperator.getOperatorName()
+ ").*%"
+ reduceSinkOp;
}

private static boolean hasPTFReduceSink(OptimizeTezProcContext ctx) {
for (ReduceSinkOperator rs : ctx.visitedReduceSinks) {
if (rs.getConf().isPTFReduceSink()) {
return true;
}
}
return false;
}

Comment on lines +1362 to +1370
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this achieves the desired outcome. Basically, if there is a PTF RS anywhere in the plan we will apply the rule on every RS (no matter if it is PTF or not).

Moreover, by relying on ctx.visitedReduceSinks we make the TopNKeyOptimization highly dependent on stats dependent optimization which is not great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For windowing queries, since there is not much performance issues with TopNKey enabled, currently making the queries to use TopNkey Path. But to match the plan, there is no sequence of PTF%RS% patterns for some queries. only RS% will work for this case.
I chosed this approach, to avoid traversing the tree to check query has PTF operator.
can you suggest a solution for the windowing queries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a better solution to traverse the tree to find PTF ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to what extend the claim that "there is not much performance issues with TopNKey enabled" holds. As mentioned in other comments first we have to confirm when/if TopNKey is efficient for PTF reducers and then decide to skip or not.

private boolean findParallelSemiJoinBranch(Operator<?> mapjoin, TableScanOperator bigTableTS,
ParseContext parseContext,
Map<ReduceSinkOperator, TableScanOperator> semijoins,
Expand Down
39 changes: 17 additions & 22 deletions ql/src/test/results/clientpositive/llap/autoColumnStats_4.q.out
Original file line number Diff line number Diff line change
Expand Up @@ -74,22 +74,17 @@ STAGE PLANS:
Filter Operator
predicate: cint is not null (type: boolean)
Statistics: Num rows: 9173 Data size: 671202 Basic stats: COMPLETE Column stats: COMPLETE
Top N Key Operator
sort order: +
keys: cint (type: int)
null sort order: z
Statistics: Num rows: 9173 Data size: 671202 Basic stats: COMPLETE Column stats: COMPLETE
top n: 10
Select Operator
expressions: cint (type: int), CAST( cstring1 AS varchar(128)) (type: varchar(128))
outputColumnNames: _col0, _col1
Statistics: Num rows: 9173 Data size: 977184 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
null sort order: z
sort order: +
Statistics: Num rows: 9173 Data size: 977184 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: varchar(128))
Select Operator
expressions: cint (type: int), CAST( cstring1 AS varchar(128)) (type: varchar(128))
outputColumnNames: _col0, _col1
Statistics: Num rows: 9173 Data size: 1479384 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
null sort order: z
sort order: +
Statistics: Num rows: 9173 Data size: 1479384 Basic stats: COMPLETE Column stats: COMPLETE
TopN Hash Memory Usage: 0.1
value expressions: _col1 (type: varchar(128))
Execution mode: vectorized, llap
LLAP IO: all inputs
Reducer 2
Expand All @@ -98,27 +93,27 @@ STAGE PLANS:
Select Operator
expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: varchar(128))
outputColumnNames: _col0, _col1
Statistics: Num rows: 9173 Data size: 977184 Basic stats: COMPLETE Column stats: COMPLETE
Statistics: Num rows: 9173 Data size: 1479384 Basic stats: COMPLETE Column stats: COMPLETE
Limit
Number of rows: 10
Statistics: Num rows: 10 Data size: 1296 Basic stats: COMPLETE Column stats: COMPLETE
Statistics: Num rows: 10 Data size: 1728 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
null sort order: a
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 10 Data size: 1296 Basic stats: COMPLETE Column stats: COMPLETE
Statistics: Num rows: 10 Data size: 1728 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: varchar(128))
Reducer 3
Execution mode: vectorized, llap
Reduce Operator Tree:
Select Operator
expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: varchar(128))
outputColumnNames: _col0, _col1
Statistics: Num rows: 10 Data size: 1296 Basic stats: COMPLETE Column stats: COMPLETE
Statistics: Num rows: 10 Data size: 1728 Basic stats: COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 10 Data size: 1296 Basic stats: COMPLETE Column stats: COMPLETE
Statistics: Num rows: 10 Data size: 1728 Basic stats: COMPLETE Column stats: COMPLETE
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Expand All @@ -128,7 +123,7 @@ STAGE PLANS:
Select Operator
expressions: _col0 (type: int), _col1 (type: varchar(128))
outputColumnNames: a, b
Statistics: Num rows: 10 Data size: 1296 Basic stats: COMPLETE Column stats: COMPLETE
Statistics: Num rows: 10 Data size: 1728 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
aggregations: min(a), max(a), count(1), count(a), compute_bit_vector_hll(a), max(length(b)), avg(COALESCE(length(b),0)), count(b), compute_bit_vector_hll(b)
minReductionHashAggr: 0.9
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -242,17 +242,12 @@ STAGE PLANS:
expressions: key (type: string)
outputColumnNames: _col0
Statistics: Num rows: 10 Data size: 870 Basic stats: COMPLETE Column stats: COMPLETE
Top N Key Operator
sort order: +
keys: _col0 (type: string)
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 20 Data size: 1740 Basic stats: COMPLETE Column stats: COMPLETE
top n: 5
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 20 Data size: 1740 Basic stats: COMPLETE Column stats: COMPLETE
TopN Hash Memory Usage: 0.1
Execution mode: vectorized, llap
LLAP IO: all inputs
Map 4
Expand All @@ -264,17 +259,12 @@ STAGE PLANS:
expressions: key (type: string)
outputColumnNames: _col0
Statistics: Num rows: 10 Data size: 870 Basic stats: COMPLETE Column stats: COMPLETE
Top N Key Operator
sort order: +
keys: _col0 (type: string)
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 20 Data size: 1740 Basic stats: COMPLETE Column stats: COMPLETE
top n: 5
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 20 Data size: 1740 Basic stats: COMPLETE Column stats: COMPLETE
TopN Hash Memory Usage: 0.1
Execution mode: vectorized, llap
LLAP IO: all inputs
Reducer 3
Expand Down Expand Up @@ -719,26 +709,22 @@ STAGE PLANS:
TableScan
alias: a
Statistics: Num rows: 10 Data size: 870 Basic stats: COMPLETE Column stats: COMPLETE
Top N Key Operator
sort order: +
keys: key (type: string)
null sort order: z
Select Operator
expressions: key (type: string)
outputColumnNames: _col0
Statistics: Num rows: 10 Data size: 870 Basic stats: COMPLETE Column stats: COMPLETE
top n: 5
Select Operator
expressions: key (type: string)
outputColumnNames: _col0
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 10 Data size: 870 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 10 Data size: 870 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 10 Data size: 870 Basic stats: COMPLETE Column stats: COMPLETE
TopN Hash Memory Usage: 0.1
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 10 Data size: 870 Basic stats: COMPLETE Column stats: COMPLETE
TopN Hash Memory Usage: 0.1
Execution mode: vectorized, llap
LLAP IO: all inputs
Reducer 2
Expand All @@ -751,17 +737,12 @@ STAGE PLANS:
Limit
Number of rows: 5
Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE Column stats: COMPLETE
Top N Key Operator
sort order: +
keys: _col0 (type: string)
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 10 Data size: 870 Basic stats: COMPLETE Column stats: COMPLETE
top n: 5
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 10 Data size: 870 Basic stats: COMPLETE Column stats: COMPLETE
TopN Hash Memory Usage: 0.1
Reducer 4
Execution mode: vectorized, llap
Reduce Operator Tree:
Expand Down Expand Up @@ -789,17 +770,12 @@ STAGE PLANS:
Limit
Number of rows: 5
Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE Column stats: COMPLETE
Top N Key Operator
sort order: +
keys: _col0 (type: string)
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 10 Data size: 870 Basic stats: COMPLETE Column stats: COMPLETE
top n: 5
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 10 Data size: 870 Basic stats: COMPLETE Column stats: COMPLETE
TopN Hash Memory Usage: 0.1
Union 3
Vertex: Union 3

Expand Down
65 changes: 25 additions & 40 deletions ql/src/test/results/clientpositive/llap/cbo_input26.q.out
Original file line number Diff line number Diff line change
Expand Up @@ -37,22 +37,17 @@ STAGE PLANS:
alias: a
filterExpr: ((ds = '2008-04-08') and (hr = '11')) (type: boolean)
Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE
Top N Key Operator
sort order: +
keys: key (type: string)
null sort order: z
Select Operator
expressions: key (type: string), value (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE
top n: 5
Select Operator
expressions: key (type: string), value (type: string)
outputColumnNames: _col0, _col1
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: string)
TopN Hash Memory Usage: 0.1
value expressions: _col1 (type: string)
Execution mode: vectorized, llap
LLAP IO: all inputs
Map 4
Expand Down Expand Up @@ -196,21 +191,16 @@ STAGE PLANS:
alias: a
filterExpr: ((ds = '2008-04-08') and (hr = '11')) (type: boolean)
Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE
Top N Key Operator
sort order: +
keys: key (type: string)
null sort order: z
Select Operator
expressions: key (type: string)
outputColumnNames: _col0
Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE
top n: 5
Select Operator
expressions: key (type: string)
outputColumnNames: _col0
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE
TopN Hash Memory Usage: 0.1
Execution mode: vectorized, llap
LLAP IO: all inputs
Map 4
Expand Down Expand Up @@ -354,21 +344,16 @@ STAGE PLANS:
alias: a
filterExpr: ((ds = '2008-04-08') and (hr = '11')) (type: boolean)
Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE
Top N Key Operator
sort order: +
keys: key (type: string)
null sort order: z
Select Operator
expressions: key (type: string)
outputColumnNames: _col0
Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE
top n: 5
Select Operator
expressions: key (type: string)
outputColumnNames: _col0
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE
TopN Hash Memory Usage: 0.1
Execution mode: vectorized, llap
LLAP IO: all inputs
Map 4
Expand Down
Loading