Skip to content

Add schema evolution support for tables with changing column definitions #7

@shefeek-jinnah

Description

@shefeek-jinnah

Description

Enable querying DuckLake tables whose Parquet files have evolved schemas across snapshots.

Problem

The current implementation assumes all Parquet files for a table share an identical schema, which breaks when columns are added, removed, or renamed across snapshots.

Solution

This change unifies file schemas at query time:

  • Uses the resolved snapshot schema as the canonical schema
  • Pads missing columns in older files with typed NULLs
  • Maps renamed columns using DuckLake column identifiers
  • Ensures consistent projection across all files

Result

Queries can safely span multiple snapshots with schema drift

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions