Skip to content

Conversation

@aaron-ang
Copy link
Contributor

Changes Made

Related Issues

Closes #2959.

@github-actions github-actions bot added the feat label Jan 29, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 29, 2026

Greptile Overview

Greptile Summary

This PR adds an optional delimiter parameter to the agg_concat/string_agg aggregation function, allowing users to specify a custom separator when concatenating string values. The feature addresses issue #2959.

Key Changes:

  • Added optional delimiter: str | None parameter to Python API (string_agg, agg_concat methods)
  • Updated Rust implementation to handle delimiter in AggExpr::Concat variant
  • Implemented custom string joining logic using join_with_delimiter helper function
  • Delimiter only supported for Utf8 columns; returns error if used with List columns
  • Empty delimiter strings are filtered out (treated as None)
  • Multi-stage aggregation properly propagates delimiter through all stages

Implementation Details:

  • The delimiter is filtered at the Rust DSL level to treat empty strings as None
  • Type checking ensures delimiter is only used with Utf8 columns
  • Null values are properly filtered out during concatenation
  • Both global and grouped aggregations support the delimiter parameter

Test Coverage:
Added tests for delimiter functionality in both global and grouped aggregation scenarios.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The implementation is well-structured with proper type checking, error handling for invalid delimiter usage, consistent propagation through all layers (Python, Rust DSL, execution), and includes test coverage for the new functionality
  • No files require special attention

Important Files Changed

Filename Overview
daft/functions/agg.py Updated string_agg function to accept and pass through delimiter parameter
src/daft-core/src/series/ops/agg.rs Implemented custom string concatenation with delimiter support, includes proper null handling
src/daft-dsl/src/expr/mod.rs Added delimiter field to AggExpr::Concat, updated all related methods and type inference
tests/recordbatch/test_table_aggs.py Added tests for delimiter parameter in both global and grouped aggregations

Sequence Diagram

sequenceDiagram
    participant User
    participant Python API
    participant PyExpr
    participant Rust DSL
    participant RecordBatch
    participant Series
    
    User->>Python API: df.agg(col("a").string_agg(delimiter=" "))
    Python API->>PyExpr: agg_concat(delimiter=" ")
    PyExpr->>Rust DSL: Expr::agg_concat(delimiter=Some(" "))
    Note over Rust DSL: Filter empty delimiter strings<br/>Store as AggExpr::Concat(expr, delimiter)
    Rust DSL->>RecordBatch: eval_agg_expr(AggExpr::Concat)
    RecordBatch->>Series: agg_concat(groups, delimiter)
    
    alt Delimiter provided for Utf8
        Series->>Series: join_with_delimiter(iter, delimiter)
        Note over Series: Custom implementation:<br/>- Filter nulls<br/>- Join with delimiter
    else No delimiter or List type
        Series->>Series: DaftConcatAggable::concat()
        Note over Series: Use existing trait implementation
    end
    
    Series-->>RecordBatch: Concatenated result
    RecordBatch-->>Rust DSL: Series
    Rust DSL-->>Python API: PyExpr
    Python API-->>User: DataFrame with aggregated column
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@codecov
Copy link

codecov bot commented Jan 29, 2026

Codecov Report

❌ Patch coverage is 7.21649% with 90 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.39%. Comparing base (aa8add2) to head (61e333a).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-core/src/series/ops/agg.rs 0.00% 56 Missing ⚠️
src/daft-dsl/src/expr/mod.rs 0.00% 17 Missing ⚠️
src/daft-local-plan/src/agg.rs 0.00% 7 Missing ⚠️
src/daft-logical-plan/src/ops/project.rs 0.00% 4 Missing ⚠️
src/daft-dsl/src/expr/agg.rs 0.00% 2 Missing ⚠️
src/daft-recordbatch/src/lib.rs 0.00% 2 Missing ⚠️
daft/dataframe/dataframe.py 75.00% 1 Missing ⚠️
src/daft-sql/src/modules/aggs.rs 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #6099       +/-   ##
===========================================
- Coverage   72.91%   43.39%   -29.52%     
===========================================
  Files         973      909       -64     
  Lines      126196   112787    -13409     
===========================================
- Hits        92016    48947    -43069     
- Misses      34180    63840    +29660     
Files with missing lines Coverage Δ
daft/expressions/expressions.py 95.12% <100.00%> (ø)
daft/functions/agg.py 97.61% <100.00%> (ø)
daft/dataframe/dataframe.py 77.38% <75.00%> (ø)
src/daft-sql/src/modules/aggs.rs 37.28% <0.00%> (-33.06%) ⬇️
src/daft-dsl/src/expr/agg.rs 0.00% <0.00%> (-77.56%) ⬇️
src/daft-recordbatch/src/lib.rs 21.62% <0.00%> (-53.06%) ⬇️
src/daft-logical-plan/src/ops/project.rs 45.97% <0.00%> (-10.64%) ⬇️
src/daft-local-plan/src/agg.rs 0.00% <0.00%> (-93.50%) ⬇️
src/daft-dsl/src/expr/mod.rs 39.29% <0.00%> (-35.62%) ⬇️
src/daft-core/src/series/ops/agg.rs 2.64% <0.00%> (-66.89%) ⬇️

... and 646 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allowing GroupedDataFrame.agg_concat to also take in a delimiter like Expression.list.join

1 participant