Skip to content

Conversation

@MazterQyou
Copy link
Member

This PR allows DataFusion to plan queries with repeated aliases and auto-realias them. Here are a few examples:

SELECT NULL, NULL, NULL

This query is valid in PostgreSQL but would fail in DataFusion because the schema would contain three NULL fields. This is now fixed by realiasing the last two expressions as NULL__1 and NULL__2.

SELECT 1 AS t, 2 AS t

This query is valid in PostgreSQL. While you cannot query t if this was a subquery in outer query, you can have the outermost query repeat the alias. This would fail in DataFusion because the schema would contain two t fields. This is now fixed by realiasing the last expression as t__1.

SELECT t1.c, t2.c
FROM t1, t2

This is valid in both PostgreSQL and DataFusion. While the alias is c for both, the schema has fields t1.c and t2.c, so the query runs.

SELECT COUNT(*)
FROM (
  SELECT t1.c, t2.c
  FROM t1, t2
) t

This is valid in PostgreSQL but fails in DataFusion. Upon realiasing the inner query as t, two fields become t.c, breaking the schema. This is now fixed by realiasing the last expression as c__1.

@ovr ovr requested a review from Copilot August 6, 2025 10:43
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for repeated aliases in SQL queries by implementing an auto-realiasing mechanism. When duplicate field names would be created in a schema, DataFusion now automatically renames subsequent duplicates with a suffix pattern (__1, __2, etc.) to ensure schema uniqueness while maintaining PostgreSQL compatibility.

Key changes:

  • Implements realias_duplicate_expr_aliases function to handle duplicate alias detection and renaming
  • Integrates auto-realiasing into SQL planning for SELECT statements and projected plans with aliases
  • Updates existing tests to verify the new behavior works correctly

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
datafusion/core/src/sql/utils.rs Adds core auto-realiasing logic with QualifiedAlias struct and realias_duplicate_expr_aliases function
datafusion/core/src/sql/planner.rs Integrates auto-realiasing into SELECT statement planning and updates tests to reflect new behavior
datafusion/core/src/logical_plan/builder.rs Adds auto-realiasing support to the project_with_alias function for subquery scenarios

Comment on lines +866 to +868
// Expr::Wildcard is simply a placeholder to please borrow checker
*expr = Expr::Alias(Box::new(replace(expr, Expr::Wildcard)), alias);
Copy link

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment suggests this is a workaround for the borrow checker, but it's unclear why Expr::Wildcard was chosen as the placeholder. Consider explaining why this specific variant is appropriate or if there's a better alternative.

Suggested change
// Expr::Wildcard is simply a placeholder to please borrow checker
*expr = Expr::Alias(Box::new(replace(expr, Expr::Wildcard)), alias);
// Use std::mem::replace to move the original expression into the Alias variant.
// This avoids borrow checker issues and preserves the original expression semantics.
*expr = Expr::Alias(Box::new(replace(expr, Expr::Alias(Box::new(Expr::Literal(ScalarValue::Null)), alias.clone()))), alias);

Copilot uses AI. Check for mistakes.
@MazterQyou MazterQyou force-pushed the cubesql/repeated-alias-fix branch from 095ded6 to e4d035e Compare August 6, 2025 10:55
@MazterQyou MazterQyou merged commit 4f8f7de into cubesql-3-04-2022 Aug 8, 2025
21 of 23 checks passed
@MazterQyou MazterQyou deleted the cubesql/repeated-alias-fix branch August 8, 2025 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants