Skip to content

[Safe Schema Evolution] Register schema in HMS pre-write to prevent Hudi/HMS schema divergence #18008

@nada-attia

Description

@nada-attia

Task Description

Why this task is needed:
Currently, Hudi performs HMS schema sync as a post-commit operation. This creates a critical failure scenario: if a writer successfully commits data with an evolved schema but the subsequent HMS sync fails, the Hudi table schema and HMS schema diverge. This divergence causes query failures for downstream consumers (Spark, Presto) that rely on HMS for schema metadata, and requires manual intervention to reconcile the schemas (i.e. rollback the commits which introduced schema changes).

What needs to be done:
To prevent this issue, Deltastreamer and Datasource writers should perform HMS schema sync before creating a commit when schema changes are detected and hoodie.datasource.hive_sync.enable=true. If the pre-commit HMS sync fails, the write operation should fail without creating a commit, ensuring that the Hudi table schema and HMS schema always remain consistent. This approach provides fail-fast behavior and eliminates the schema divergence window entirely.

Task Type

Code improvement/refactoring

Related Issues

Parent feature issue: (if applicable )
Related issues:
NOTE: Use Relationships button to add parent/blocking issues after issue is created.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type:devtaskDevelopment tasks and maintenance work

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions