Skip to content

fix: generated column expr with SchemaMode::Merge handles missing columns#4223

Merged
rtyler merged 2 commits intodelta-io:mainfrom
veeceey:fix/issue-4169-generated-columns-merge
Feb 27, 2026
Merged

fix: generated column expr with SchemaMode::Merge handles missing columns#4223
rtyler merged 2 commits intodelta-io:mainfrom
veeceey:fix/issue-4169-generated-columns-merge

Conversation

@veeceey
Copy link
Contributor

@veeceey veeceey commented Feb 23, 2026

Fixes #4169

When appending with SchemaMode::Merge, generated column expressions can reference nullable columns that aren't in the input batch. The problem was that with_generated_columns() resolves expressions against the plan schema before schema evolution runs -- so any column that schema evolution would add as NULL doesn't exist yet, and parse_predicate_expression fails.

The fix catches expression resolution errors in with_generated_columns() and falls back to a typed NULL placeholder. This lets the pipeline continue through schema evolution, which adds the missing columns. The DataValidationExec then sees NULL IS NOT DISTINCT FROM NULL = true, correctly passing validation.

This mirrors the approach already used in add_missing_generated_columns() (the merge path), which also inserts NULL placeholders for missing generated columns.

Test plan

  • Added test_generated_column_referencing_missing_column_uses_null_placeholder test that reproduces the exact failure described in the issue
  • All 7 generated_columns tests pass
  • cargo test -p deltalake-core --lib --features datafusion -- "generated_columns::tests" passes cleanly

@veeceey
Copy link
Contributor Author

veeceey commented Feb 23, 2026

Test Results

$ cargo test -p deltalake-core --lib --features datafusion -- "generated_columns::tests"

running 7 tests
test operations::write::generated_columns::tests::test_empty_generated_columns ... ok
test operations::write::generated_columns::tests::test_existing_columns_pass_through ... ok
test operations::write::generated_columns::tests::test_multiple_generated_columns ... ok
test operations::write::generated_columns::tests::test_generated_column_referencing_missing_column_uses_null_placeholder ... ok
test operations::write::generated_columns::tests::test_missing_non_generated_nullable_column_does_not_error ... ok
test operations::write::generated_columns::tests::test_add_missing_generated_column ... ok
test operations::write::generated_columns::tests::test_mixed_existing_and_generated_columns ... ok

test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 630 filtered out; finished in 0.09s

@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Feb 23, 2026
@veeceey veeceey force-pushed the fix/issue-4169-generated-columns-merge branch from 68ce8fc to d742a83 Compare February 23, 2026 04:43
Copy link
Collaborator

@ethan-tyler ethan-tyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix here @veeceey. The direction looks right. Requesting changes for the Err(_) catch all in with_generated_columns swallows all parse/resolve failures.

The e2e test is a nice to have.

}
e
}
Err(_) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too broad. We should only catch errors during the schema merge and everything else should surface.

e
}
Err(_) => {
debug!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bind the error and include it in the log

/// does not fail, but instead produces a NULL placeholder.
/// This is the core fix for #4169.
#[test]
fn test_generated_column_referencing_missing_column_uses_null_placeholder() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test. It would be nice to have an e2e test through WriteBuilder &
SchemaMode::Merge that asserts actual write time values and not just
schema shape.

@ethan-tyler
Copy link
Collaborator

ethan-tyler commented Feb 24, 2026

Thanks for the changes, lgtm. Just fix the DCO sign off and we should be good to go. Thanks again @veeceey

Looks like you need to fmt as well

@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 88.67925% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.82%. Comparing base (ce6709b) to head (c82fb8a).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...tes/core/src/operations/write/generated_columns.rs 88.67% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4223   +/-   ##
=======================================
  Coverage   76.81%   76.82%           
=======================================
  Files         167      167           
  Lines       48849    48874   +25     
  Branches    48849    48874   +25     
=======================================
+ Hits        37524    37546   +22     
- Misses       9421     9425    +4     
+ Partials     1904     1903    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rtyler rtyler self-assigned this Feb 27, 2026
…aMode::Merge

When using SchemaMode::Merge, the input batch may omit nullable columns
that a generated column expression references. Previously,
with_generated_columns() called parse_predicate_expression() against
the pre-evolution plan schema, which failed because the column didn't
exist yet -- schema evolution hadn't run to add it as NULL.

Now when expression resolution fails, we fall back to a typed NULL
placeholder. Schema evolution will later add the missing base columns,
and DataValidationExec will see NULL IS NOT DISTINCT FROM NULL = true,
which correctly passes validation.

Closes delta-io#4169

Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
@rtyler rtyler force-pushed the fix/issue-4169-generated-columns-merge branch from d568488 to 70b49bd Compare February 27, 2026 18:47
… resolution errors only

Address review feedback:
- Replace broad Err(_) catch-all with a guard that only matches column
  resolution errors (No field named / Schema error), letting parse and
  type errors propagate normally.
- Bind the error and include it in the debug log message.
- Add e2e test through WriteBuilder & SchemaMode::Merge that writes
  with and without the referenced column, then asserts actual values.
@rtyler rtyler force-pushed the fix/issue-4169-generated-columns-merge branch from 70b49bd to c82fb8a Compare February 27, 2026 18:53
@rtyler rtyler merged commit 91a8abb into delta-io:main Feb 27, 2026
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

binding/rust Issues for the Rust crate

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

bug: generated column expr fails when SchemaMode::Merge would add referenced column

4 participants