Skip to content

[Bug]: Table merging fails with merge_schema=True #3943

@MachSilva

Description

@MachSilva

What happened?

First, I created an empty delta table with an initial schema.
Then, I tried to merge the new table with another table that has an additional column named "date" using merge_schema=True.
However, it failed with the error External error: Schema error: Duplicate field name: date.

Expected behavior

I was expecting the schema evolution of the table named "sometable" with the new column named "date".

Operating System

Linux

Binding

Python

Bindings Version

1.2.1

Steps to reproduce

  1. Create a python virtual environment with the packages deltalake==1.2.1 polars==1.35.2 pyarrow. Python version: 3.12

  2. Make sure the table "sometable" does not exist

  3. Run the code:

import traceback
import polars as pl
from deltalake import DeltaTable

try:
    initial = pl.DataFrame(schema={"code": pl.String, "index": pl.UInt32}).to_arrow()
    print("[*] Initial Schema:\n", initial.schema)

    DeltaTable.create("sometable", initial.schema)
    print(pl.read_delta("sometable"))


    df = pl.DataFrame({
        "code": ["12", "23", "42", "43"],
        "date": ["2018", "2018", "2019", "2020"],
    }).with_row_index("index")

    DeltaTable("sometable").merge(
        df,
        source_alias="s",
        target_alias="t",
        predicate="t.code = s.code AND t.index = s.index",
        merge_schema=True,
    ).when_matched_update_all().when_not_matched_insert_all().execute()

    print(pl.read_delta("sometable"))

except Exception:
    traceback.print_exc()

Relevant logs

Traceback (most recent call last):
  File "/tmp/delta/report/table_merge.py", line 24, in <module>
    ).when_matched_update_all().when_not_matched_insert_all().execute()
                                                              ^^^^^^^^^
  File "/tmp/delta/report/.venv/lib/python3.12/site-packages/deltalake/table.py", line 1685, in execute
    metrics = self._table.merge_execute(self._builder)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: External error: Schema error: Duplicate field name: date

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions