-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
WindowNode, RowNumberNode, and TopNRowNumberNode unconditionally pass through all input columns to the output. There is no way to specify which input columns should be included in the output alongside the window function result columns.
Problem
Consider this query:
SELECT n_name, rank() OVER (PARTITION BY n_regionkey) as rn
FROM nation LIMIT 10The distributed plan is:
Scan(nation) → Shuffle(n_regionkey) → Window(rank() PARTITION BY n_regionkey)
→ partialLimit(10) → localPartition → finalLimit(10) → Gather → finalLimit(10) → Project(n_name, rn)
The Window operator outputs all input columns (n_nationkey, n_name, n_regionkey, n_comment) plus rn. Only n_name and rn are needed downstream. The extra columns (n_nationkey, n_regionkey, n_comment) flow through the entire limit chain and gather unnecessarily, wasting network bandwidth in the shuffle/gather and memory in the limit operators.
The ideal plan would drop unused columns right after the Window:
Scan(nation) → Shuffle(n_regionkey) → Window(rank() PARTITION BY n_regionkey)
→ Project(n_name, rn) → partialLimit(10) → ... → Gather → finalLimit(10)
Proposal
Add an outputType parameter to WindowNode, RowNumberNode, and TopNRowNumberNode constructors, following the pattern used by HashJoinNode and MergeJoinNode. Today, join nodes accept an explicit RowTypePtr outputType that lets the caller specify exactly which columns appear in the output. Window-family nodes compute their output internally by concatenating all input columns with the window function results (getWindowOutputType, getOptionalRowNumberOutputType), providing no way to control the output.
The outputType must include all window function result columns and a subset of input columns, with no duplicates. The node should validate this in the constructor.
Workaround
Currently, the optimizer must insert a ProjectNode after the window operator to drop unused columns. This is wasteful as it adds an extra operator to the plan that could be avoided if the window node controlled its output directly.