Skip to content

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented Jan 7, 2026

Which issue does this PR close?


Rationale for this change

Casting between STRUCT types in DataFusion previously relied on Arrow’s default positional struct casting. When two structs had the same field types but different field order, values could be silently assigned to the wrong target fields.

Example:

  • Source: {b: 3, a: 4}
  • Target: STRUCT(a INT, b INT)

Expected: {a: 4, b: 3}

Actual (before): {a: 3, b: 4}

This is particularly dangerous because it produces valid-looking results while corrupting data semantics. The fix aligns struct fields by name during casting and provides a safe positional fallback only when name matching is impossible.


What changes are included in this PR?

1) Name-based struct casting in execution paths

  • Added struct casting logic that matches fields by name (case-sensitive), reorders child arrays accordingly, fills missing target fields with nulls, and ignores extra source fields.

  • Implemented this in:

    • datafusion_common::nested_struct via cast_struct_array_by_name and enhanced cast_struct_column.
    • ScalarValue::cast_to_with_options so scalar struct casts also align by field name.
    • ColumnarValue::cast_to to route struct casts through the name-based caster while preserving Arrow casting for non-struct types.

2) Safe positional fallback when there is no name overlap

  • If source and target struct fields have no overlapping names, casting falls back to positional mapping only when field counts match.

  • If there is no overlap and field counts differ, the cast fails with a clear plan error:

    • Cannot cast struct with X fields to Y fields without name overlap; positional mapping is ambiguous

3) Better handling for null / all-null struct sources

  • If the source column is Null-typed or all rows are null, struct casts now return a properly typed null struct array of the target type (including nested struct targets).

4) Type coercion improvements for binary expressions involving structs

  • Updated binary type coercion so when two struct types have the same set of field names (possibly reordered), coercion prefers name-based alignment.
  • Preserves backward compatibility by keeping positional coercion when name sets differ.
  • Maintains the requirement that field counts match for struct coercion.

5) Planning/optimizer/cast validation adjustments

  • Allow struct-to-struct casts at planning time even when Arrow’s can_cast_types would reject them, since name-based execution-time casting can handle reordering/missing/extra fields.
  • Prevent constant folding of struct casts when source/target field counts differ (avoids folding into an error / incorrect assumption).
  • Minor optimizer error handling tweak: return DataFusionError::Plan errors directly without wrapping them in the “Optimizer rule … failed” context.

6) Tests and SQL logic test updates

  • Added unit tests covering:

    • Field reordering by name
    • Missing fields inserting nulls
    • Positional fallback when no overlap
    • Error when no overlap + mismatched lengths
    • Nested struct null casting
  • Updated sqllogictest expectations:

    • Documented name-based coercion behavior for CASE expressions
    • Removed prior tests that depended on out-of-order struct literals or mixed-type struct lists being auto-cast positionally
    • Added new tests demonstrating correct CAST behavior with reordering, nested structs, missing/extra fields, and null preservation

Are these changes tested?

Yes.

  • Added/updated Rust unit tests in:

    • datafusion/common/src/nested_struct.rs
    • datafusion/expr-common/src/columnar_value.rs
  • Updated SQL logic tests:

    • datafusion/sqllogictest/test_files/struct.slt
    • datafusion/sqllogictest/test_files/case.slt

These cover:

  • Correct field-by-name mapping and reordering
  • Missing/extra field handling
  • Nested struct casting
  • Null and all-null struct inputs
  • Positional fallback behavior + explicit error for ambiguous casts

Are there any user-facing changes?

Yes.

  • Behavior change: CAST between struct types now matches fields by name when possible, which fixes silent value swaps when field orders differ.

  • Fallback behavior: When there is no name overlap, casts are positional only if field counts match; otherwise they error with a clear message.

  • SQL behavior changes reflected in tests:

    • Some previous behaviors that implicitly relied on positional coercion of out-of-order struct literals or mixed struct list elements are no longer accepted.

No documentation changes are included in this PR, but the SQL logic tests now encode the intended behavior.


LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

Route ColumnarValue::cast_to through a name-based struct casting
path for both array and scalar struct values. Introduce a helper
to reorder struct children by target field names, insert nulls
for missing fields, and recursively cast each child with
Arrow options. Add unit tests to verify struct field reordering
and null-filling for missing fields when casting between
struct schemas.
…::cast_to

Add comprehensive documentation explaining that struct casting uses field name
matching rather than positional matching. This clarifies the behavior change
for struct types while preserving existing documentation for other types.

Addresses PR review recommendation #4 about documenting public API changes.
Replace .clone() with Arc::clone() to address clippy warning about
clone_on_ref_ptr. This makes the ref-counting operation explicit and
follows Rust best practices.

Fixes clippy error from rust_lint.sh.
## Problem
The PR to fix struct casting (issue apache#14396) introduced regressions where struct
casting with field additions/removals was failing, and name-based field matching
wasn't working correctly in all scenarios.

## Root Causes Identified
1. **Field index mismatch**: cast_struct_array_by_name was using field indices from
   the DataType instead of the actual StructArray, causing wrong column access when
   field names didn't match physical layout.

2. **Missing fallback logic**: When source and target had no overlapping field names
   (e.g. {c0, c1} → {a, b}), name-based matching failed silently, returning NULLs.
   Added fallback to positional casting for non-overlapping fields.

3. **Optimizer const-folding issue**: ScalarValue::cast_to_with_options was calling
   Arrow's cast_with_options directly, which doesn't support struct field count
   changes. The optimizer's simplify_expressions rule would fail when trying to
   fold struct casts at compile time.

4. **Validation rejection**: The logical planner's can_cast_types check rejected
   struct-to-struct casts with mismatched field counts before execution. Added
   special handling to allow all struct-to-struct casts (validation at runtime).

## Solution
- Created datafusion/common/src/struct_cast.rs with shared name-based struct
  casting logic for both runtime (ColumnarValue) and optimization-time (ScalarValue)
- Updated ScalarValue::cast_to_with_options to use the name-based struct casting
- Updated ColumnarValue::cast_to to use the shared logic
- Updated Expr::cast_to validation to allow struct-to-struct casts
- Added fallback to positional casting when field names don't overlap
- Fixed struct array field access to use actual StructArray fields, not DataType fields
- Updated tests to reflect new behavior and correct syntax issues

## Behavior Changes
Struct casts now work correctly with:
- Field reordering: {b: 3, a: 4} → STRUCT(a INT, b INT) → {a: 4, b: 3}
- Field additions: {a: 1} → STRUCT(a INT, b INT) → {a: 1, b: NULL}
- Field removals: {a: 1, b: 2, c: 3} → STRUCT(a INT, b INT) → {a: 1, b: 2}
- Fallback to positional casting when no field names overlap

## Files Modified
- datafusion/common/src/lib.rs: Added struct_cast module
- datafusion/common/src/struct_cast.rs: New shared struct casting logic
- datafusion/common/src/scalar/mod.rs: Use name-based struct casting
- datafusion/expr-common/src/columnar_value.rs: Delegate to shared casting logic
- datafusion/expr/src/expr_schema.rs: Allow struct-to-struct casts through validation
- datafusion/sqllogictest/test_files/struct.slt: Fixed tests and added new ones
…ype comparison

This aligns the type coercion logic with the new name-based struct casting
semantics introduced for explicit CAST operations in issue apache#14396.

Changes:
- struct_coercion in binary.rs now attempts name-based field matching first
- Falls back to positional matching when no field names overlap
- Requires matching field counts for successful coercion
- Preserves left-side field names and order when using name-based matching

This fixes cases where CASE expressions with structs having the same fields
in different orders now correctly match fields by name rather than position.

Note: Several CASE expression tests with struct field reordering are disabled
due to const-folding optimizer hang when evaluating struct literals with
different field orders. This requires separate investigation and fix.
…-folding

This fixes the TODO tests in struct.slt that were causing optimizer hangs
when attempting to const-fold struct literal casts with field count mismatches.

Changes:
1. Modified expr_simplifier.rs can_evaluate() to skip const-folding for
   struct CAST/TryCAST expressions where source and target have different
   field counts. This prevents the optimizer from attempting to evaluate
   these at plan time, deferring to execution time instead.

2. Modified cast.rs cast_with_options() to allow all struct-to-struct casts
   at physical planning time, even when Arrow's can_cast_types rejects them.
   These casts are handled by name-based casting at execution time via
   ColumnarValue::cast_to.

3. Uncommented and fixed TODO tests in struct.slt:
   - CAST({a: 1} AS STRUCT(a INT, b INT)) - adds NULL for missing field b
   - CAST({a: 1, b: 2, extra: 3} AS STRUCT(a INT, b INT)) - ignores extra field

The fix ensures that:
- Optimizer doesn't hang trying to const-fold unsupported struct casts
- Physical planner accepts struct-to-struct casts with field count changes
- Execution uses name-based casting to handle field reordering and NULLs
This uncomments and fixes the TODO tests for struct coercion in CASE expressions
that were previously disabled due to concerns about optimizer hangs.

Changes:
1. Uncommented the TODO test section for struct coercion with different field orders.
   Tests now verify that name-based struct coercion works correctly in CASE expressions.

2. Updated test expectations to match actual behavior:
   - When THEN branch executes, result uses THEN branch's field order
   - When ELSE branch executes, result uses ELSE branch's field order
   - Struct coercion requires equal field counts - mismatch causes planning error

3. Added explicit test for field count mismatch case:
   - Verifies that coercing structs with different field counts (2 fields vs 1 field)
     correctly fails during type coercion with appropriate error message

The tests now pass because:
- The optimizer fix from the previous commit prevents const-folding hangs
- Name-based struct coercion in struct_coercion function handles field reordering
- Type coercion correctly rejects field count mismatches during planning
Remove duplicate struct_cast.rs module and use the existing
nested_struct::cast_struct_column implementation instead. This
eliminates code duplication and provides a single source of truth
for struct field-by-name casting logic.

Changes:
- Add public cast_struct_array_by_name wrapper in nested_struct.rs
- Update columnar_value.rs to use nested_struct::cast_struct_array_by_name
- Update scalar/mod.rs to use nested_struct::cast_struct_array_by_name
- Remove struct_cast module from lib.rs
- Delete datafusion/common/src/struct_cast.rs

Benefits:
- Single implementation to maintain and test
- Consistent behavior across all struct casting operations
- Reduced maintenance burden for future bug fixes
- Better code cohesion in nested_struct module
@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Jan 7, 2026
kosiew added 14 commits January 13, 2026 19:48
Implement name-overlap detection with positional fallback and clearer
ambiguity errors for non-overlapping cases. Enhance unit tests and
SQLLogicTest coverage to include tests for positional struct casting
and scenarios lacking name overlap.
Implement null fast paths for struct casting and compatibility
checks to return null struct arrays for NULL-only inputs.
Allow NULL source fields during validation. Add a unit test
covering the scenario of casting a NULL struct field into a
nested struct target.
Return plan errors directly from optimizer rule failures and update
the related test expectations. Relax the SQLogicTest expectation for
struct cast mismatches to allow for either plan-error prefix
variant.
Consolidate duplicated logic in cast_struct_column by using a single
loop for both name-overlap and positional mapping cases. This change
selects the source child by either name or position and centralizes
the cast-and-error handling, improving code clarity and maintainability.
@@ -492,8 +492,7 @@ Struct("r": Utf8, "c": Float64)
statement ok
drop table t;

query error DataFusion error: Optimizer rule 'simplify_expressions' failed[\s\S]*Arrow error: Cast error: Cannot cast string 'a' to value of Float64 type
create table t as values({r: 'a', c: 1}), ({c: 2.3, r: 'b'});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant test removed: covered by the 'Struct Casting with Field Reordering' suite

Comment on lines 562 to 564
# out of order struct literal
# TODO: This query should not fail
statement error DataFusion error: Optimizer rule 'simplify_expressions' failed[\s\S]*Arrow error: Cast error: Cannot cast string 'b' to value of Int32 type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant test removed: out-of-order struct literal is covered by casting tests below

Comment on lines 576 to 578
# Can't create a list of struct with different field types
query error
select [{r: 'a', c: 1}, {c: 2, r: 'b'}];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant array literal test removed: covered by the 'Struct Casting with Field Reordering' suite

Comment on lines 591 to 621
# create table with different struct type is fine
statement ok
create table t(a struct(r varchar, c int), b struct(c float, r varchar)) as values (row('a', 1), row(2.3, 'b'));

# create array with different struct type is not valid
query error
select arrow_typeof([a, b]) from t;

statement ok
drop table t;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant create/select array-with-different-struct-types test removed: functionality covered by later 'Struct Casting with Field Reordering' table tests

@kosiew kosiew marked this pull request as ready for review January 14, 2026 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wrong cast between structs with different field order

1 participant