Skip to content

Conversation

@caican00
Copy link
Contributor

@caican00 caican00 commented Jan 20, 2026

Changes Made

We encountered an issue about aggregate functions are not permitted in the SELECT statement unless a GROUP BY clause is specified.

for example:

from daft import col
import daft
df = daft.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]})
df = df.select(
col('a').sum().alias('sum')
)
df.show()
DaftCoreException: DaftError::ValueError Aggregation expressions are currently only allowed in agg, pivot, and window: sum(col(a))

This issue has been resolved in daft sql #2799, but it is still not supported in dataframe. This PR is precisely aimed at addressing this issue.

Related Issues

I found that a related issue already exists. #1979

@github-actions github-actions bot added the feat label Jan 20, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 20, 2026

Greptile Summary

This PR enables global aggregations in df.select() by detecting when aggregation expressions are present and automatically creating an Aggregate->Project logical plan instead of just a Project.

Key Changes:

  • Modified LogicalPlanBuilder::select() to detect aggregation expressions using has_agg()
  • When aggregations are detected, expressions are split into aggregation expressions and foldable literals
  • Creates an Aggregate node with empty groupby (global aggregation), then a Project node to map semantic IDs back to user-facing column names
  • Non-aggregation expressions that reference columns are properly rejected with clear error messages
  • Only allows foldable literals (like lit(1)) to be mixed with aggregations

Test Coverage:

  • Tests basic global aggregation (col('a').sum())
  • Tests multiple aggregations and mixing with literals
  • Tests aggregations without aliases
  • Tests proper rejection of mixed agg/non-agg column references

Minor Issues Found:

  • Line 276 performs a redundant alias operation that is immediately stripped on line 278 (style issue, no functional impact)
  • Test uses overly broad exception matching (should use ValueError instead of Exception)

Confidence Score: 4/5

  • Safe to merge with minor style improvements recommended
  • The implementation is logically sound and properly handles edge cases (nested aggregations, mixed agg/non-agg expressions, literals). Test coverage is comprehensive for the basic use cases. The only issues found are minor style/efficiency concerns (redundant aliasing, broad exception matching) that don't affect correctness.
  • No files require special attention - the implementation is solid

Important Files Changed

Filename Overview
src/daft-logical-plan/src/builder/mod.rs Adds global aggregation support to select() by detecting aggregations and creating an Aggregate->Project plan. Logic is sound but has minor inefficiencies (redundant aliasing, duplicate semantic_id computation).
tests/dataframe/test_select_global_agg.py Comprehensive test coverage for basic global aggregation scenarios: single agg, multiple aggs, with literals, without alias, and rejection of mixed agg/non-agg columns.

Sequence Diagram

sequenceDiagram
    participant User
    participant LogicalPlanBuilder
    participant ExprResolver
    participant has_agg
    participant Aggregate
    participant Project

    User->>LogicalPlanBuilder: select([col('a').sum().alias('sum_a')])
    LogicalPlanBuilder->>ExprResolver: resolve(to_select)
    ExprResolver-->>LogicalPlanBuilder: resolved expressions
    
    LogicalPlanBuilder->>has_agg: check if any expr has aggregation
    has_agg-->>LogicalPlanBuilder: true
    
    Note over LogicalPlanBuilder: Split expressions into agg and foldable
    
    loop For each expression
        alt Expression has aggregation
            LogicalPlanBuilder->>LogicalPlanBuilder: Validate top-level Agg only
            LogicalPlanBuilder->>LogicalPlanBuilder: Add to agg_exprs
        else Expression is non-agg
            LogicalPlanBuilder->>ExprResolver: Validate with empty groupby
            LogicalPlanBuilder->>LogicalPlanBuilder: Check requires_computation
            alt Is foldable literal
                LogicalPlanBuilder->>LogicalPlanBuilder: Add to foldable_exprs
            else Requires computation
                LogicalPlanBuilder-->>User: Error: Expected aggregation
            end
        end
    end
    
    Note over LogicalPlanBuilder: Build internal aggregate plan
    
    loop For each agg expression
        LogicalPlanBuilder->>LogicalPlanBuilder: Compute semantic_id
        LogicalPlanBuilder->>LogicalPlanBuilder: Create internal agg (semantic_id alias)
        LogicalPlanBuilder->>LogicalPlanBuilder: Create final expr (user name alias)
    end
    
    LogicalPlanBuilder->>Aggregate: try_new(agg_internal, empty groupby)
    Aggregate-->>LogicalPlanBuilder: agg_plan
    
    LogicalPlanBuilder->>LogicalPlanBuilder: Extend final_exprs with foldable_exprs
    
    LogicalPlanBuilder->>Project: try_new(agg_plan, final_exprs)
    Project-->>LogicalPlanBuilder: logical_plan
    
    LogicalPlanBuilder-->>User: Updated plan (Aggregate -> Project)
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@caican00 caican00 force-pushed the to-up-basic-global-expressions-within-expression-tree branch from 1ad115c to ede725e Compare January 21, 2026 02:08
@caican00
Copy link
Contributor Author

caican00 commented Jan 21, 2026

Gently ping @colin-ho @kevinzwang please help review this PR when you are free. Thank you for your time.

@codecov
Copy link

codecov bot commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 80.95238% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.91%. Comparing base (7bec778) to head (ede725e).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-logical-plan/src/builder/mod.rs 82.25% 11 Missing ⚠️
src/daft-logical-plan/src/builder/resolve_expr.rs 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6067      +/-   ##
==========================================
- Coverage   72.91%   72.91%   -0.01%     
==========================================
  Files         973      973              
  Lines      126166   126243      +77     
==========================================
+ Hits        91995    92044      +49     
- Misses      34171    34199      +28     
Files with missing lines Coverage Δ
src/daft-logical-plan/src/builder/resolve_expr.rs 82.45% <0.00%> (+1.32%) ⬆️
src/daft-logical-plan/src/builder/mod.rs 78.58% <82.25%> (+0.15%) ⬆️

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@caican00
Copy link
Contributor Author

Hi @universalmind303 please help review this pr if you are free. Thank you for your time and sorry if this is jumping the gun.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant