Skip to content

Conversation

@murali-db
Copy link
Collaborator

@murali-db murali-db commented Nov 14, 2025

⚠️ REVIEW ORDER: This PR depends on #1477 and should be reviewed AFTER it.

This PR is stacked on top of #1477. The diff currently shows both PRs' changes, but once #1477 merges, GitHub will automatically update to show only this PR's changes.


This is part 2 of 5 PRs that implement schema diffing for Delta Kernel Rust.

What's in this PR

  • Complete diffing algorithm for flat schemas (top-level fields only)
  • Field collection and ID-based matching
  • Add/remove/rename/nullability/metadata change detection
  • Physical name validation for column mapping
  • Breaking change classification
  • Type classification (including arrays and maps - ready for PR 4/5)
  • Ancestor filtering (LCA reporting)
  • Tests for flat schema diffing

What's NOT in this PR

  • Nested struct support (coming in PR 3)
  • Array tests (coming in PR 4)
  • Map tests (coming in PR 5)

Feature Gating

Inherits feature gate from PR #1477. Gate will be removed in PR 5.

Part of #1346

Introduces core data structures for schema diffing:
- SchemaDiff, FieldChange, FieldUpdate types
- FieldChangeType enum for classifying changes
- SchemaDiffError for validation errors
- ColumnName::parent() helper method

This is part 1/5 of the schema diffing feature implementation.
The actual diffing algorithm will be added in PR 2.

Note: This PR includes a temporary stub for compute_schema_diff()
to allow basic tests to compile. The full implementation from the
original PR delta-io#1346 will be copied exactly in PR 2.

Related to delta-io#1346
@codecov
Copy link

codecov bot commented Nov 14, 2025

Codecov Report

❌ Patch coverage is 88.19876% with 76 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.92%. Comparing base (fe01172) to head (c7071f8).

Files with missing lines Patch % Lines
kernel/src/schema/diff.rs 88.26% 66 Missing and 9 partials ⚠️
kernel/src/expressions/column_names.rs 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1478      +/-   ##
==========================================
+ Coverage   84.84%   84.92%   +0.08%     
==========================================
  Files         120      121       +1     
  Lines       32103    32747     +644     
  Branches    32103    32747     +644     
==========================================
+ Hits        27238    27811     +573     
- Misses       3542     3602      +60     
- Partials     1323     1334      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Adds a comprehensive unit test that exercises the filtering logic by
manually constructing a SchemaDiff with both top-level and nested
field changes. This test verifies that the methods correctly filter
fields by path depth (length 1 vs length > 1).

The test improves code coverage for these methods from 0% to full
coverage of the filtering logic, addressing CI coverage requirements.
Adds complete diffing functionality for non-nested schemas:
- Field collection and ID-based matching
- Detection of adds, removes, renames, nullability changes
- Physical name validation for column mapping
- Breaking change classification
- Full type classification including arrays and maps
- Ancestor filtering for LCA reporting

Currently supports flat schemas (top-level fields only). The
collect_all_fields_with_paths() function has a commented-out
recursive call that will be enabled in PR 3 to support nested fields.

All other functions are copied exactly from the original PR delta-io#1346
(murali-db/schema-evol) with no logic changes.

This is part 2/5 of the schema diffing feature implementation.

Tests included (9 tests):
- test_identical_schemas
- test_change_count
- test_top_level_added_field
- test_added_required_field_is_breaking
- test_added_nullable_field_is_not_breaking
- test_physical_name_validation
- test_multiple_change_types
- test_multiple_with_breaking_change
- test_duplicate_field_id_error

Related to delta-io#1346
}

// Construct parent path by removing the last component
let parent_path = ColumnName::new(&path_parts[..path_parts.len() - 1]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use path.parent()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants