-
Notifications
You must be signed in to change notification settings - Fork 392
feat: Support default alias for non-column refs for sql select #6071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Support default alias for non-column refs for sql select #6071
Conversation
Greptile SummaryThis PR adds support for default column aliases in SQL SELECT statements and enables global aggregations in DataFrame Key Changes:
Issues Found:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant SQLPlanner
participant ExprName
participant LogicalPlanBuilder
participant ExprResolver
participant Aggregate
participant Project
User->>SQLPlanner: sql("select sum(a) from df")
SQLPlanner->>SQLPlanner: parse SQL AST
SQLPlanner->>ExprName: normalized_sql_expr_name(sum(a))
ExprName-->>SQLPlanner: "sum(a)"
SQLPlanner->>SQLPlanner: plan_expr() to create Daft Expr
SQLPlanner->>SQLPlanner: alias expr as "sum(a)"
SQLPlanner->>LogicalPlanBuilder: select([sum(a).alias("sum(a)")])
LogicalPlanBuilder->>LogicalPlanBuilder: Check if to_select has aggregations
alt Contains aggregations
LogicalPlanBuilder->>ExprResolver: resolve with in_agg_context=true
ExprResolver-->>LogicalPlanBuilder: resolved expressions
LogicalPlanBuilder->>LogicalPlanBuilder: Split into agg_exprs and foldable_exprs
LogicalPlanBuilder->>LogicalPlanBuilder: Alias aggs with semantic_id
LogicalPlanBuilder->>Aggregate: Create Aggregate node
Aggregate-->>LogicalPlanBuilder: agg_plan
LogicalPlanBuilder->>Project: Create Project with user-visible aliases
Project-->>LogicalPlanBuilder: final plan
else No aggregations
LogicalPlanBuilder->>ExprResolver: resolve normally
ExprResolver-->>LogicalPlanBuilder: resolved expressions
LogicalPlanBuilder->>Project: Create Project node
Project-->>LogicalPlanBuilder: final plan
end
LogicalPlanBuilder-->>User: DataFrame with column "sum(a)"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7 files reviewed, 2 comments
| def test_select_global_agg_rejects_non_agg_column_reference() -> None: | ||
| df = daft.from_pydict({"a": [1, 2, 3]}) | ||
|
|
||
| with pytest.raises(ValueError, match="Expressions in aggregations"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: test expects "Expressions in aggregations" error message, but the actual error message from mod.rs:302 is "Expected aggregation (or a foldable literal) in select() only when aggregation expressions are present without groupby."
| with pytest.raises(ValueError, match="Expressions in aggregations"): | |
| with pytest.raises(ValueError, match="Expected aggregation"): |
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/dataframe/test_select_global_agg.py
Line: 58:58
Comment:
**logic:** test expects "Expressions in aggregations" error message, but the actual error message from `mod.rs:302` is "Expected aggregation (or a foldable literal) in select() only when aggregation expressions are present without groupby."
```suggestion
with pytest.raises(ValueError, match="Expected aggregation"):
```
How can I resolve this? If you propose a fix, please make it concise.| let logical_plan: LogicalPlan = ops::Project::try_new(self.plan.clone(), to_select)?.into(); | ||
| Ok(self.with_new_plan(logical_plan)) | ||
| // if a SELECT contains aggregations and there is no GROUP BY, treat it as a global aggregation. | ||
| if to_select.iter().any(has_agg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: use the cached to_select_has_aggs variable instead of rechecking
| if to_select.iter().any(has_agg) { | |
| if to_select_has_aggs { |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/daft-logical-plan/src/builder/mod.rs
Line: 259:259
Comment:
**style:** use the cached `to_select_has_aggs` variable instead of rechecking
```suggestion
if to_select_has_aggs {
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.1de4e26 to
aaea901
Compare
Changes Made
For example, when using a function in a select expression and no column alias is specified, the displayed result directly shows the original column names, which can lead to semantic misunderstandings.
And i directly use the expression string as the final column name for display, distinguishing it from the original column names and making the result more clear,like this:
Related Issues
#6070