-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Task Description
Why this task is needed:
Currently, Hudi performs HMS schema sync as a post-commit operation. This creates a critical failure scenario: if a writer successfully commits data with an evolved schema but the subsequent HMS sync fails, the Hudi table schema and HMS schema diverge. This divergence causes query failures for downstream consumers (Spark, Presto) that rely on HMS for schema metadata, and requires manual intervention to reconcile the schemas (i.e. rollback the commits which introduced schema changes).
What needs to be done:
To prevent this issue, Deltastreamer and Datasource writers should perform HMS schema sync before creating a commit when schema changes are detected and hoodie.datasource.hive_sync.enable=true. If the pre-commit HMS sync fails, the write operation should fail without creating a commit, ensuring that the Hudi table schema and HMS schema always remain consistent. This approach provides fail-fast behavior and eliminates the schema divergence window entirely.
Task Type
Code improvement/refactoring
Related Issues
Parent feature issue: (if applicable )
Related issues:
NOTE: Use Relationships button to add parent/blocking issues after issue is created.