Skip to content

Iceberg extension quietly ignores delete markers resulting in incorrect data #18858

@SamWheating

Description

@SamWheating

If an iceberg table using merge-on-read updates or deletes is ingested into druid, then the deleted rows will be ingested as well.

As a simple example, we can create a quick Iceberg table using Spark:

val df = Seq(
    ("store_a", 1, 100),
    ("store_a", 2, 200),
    ("store_b", 3, 300),
    ("store_b", 4, 400),
).toDF("store_id", "item_count", "price_total")

df.withColumn("ts", current_timestamp()).
    writeTo("demo.test_database.checkouts").
    using("iceberg").
    partitionedBy(hours($"ts")).
    tableProperty("write.update.mode", "merge-on-read").
    create()

Then update the table:

UPDATE demo.test_database.checkouts SET total_price=0 WHERE store_id = 'store_a'

Ingesting the table into druid then shows 6 rows, due to ingesting both versions of the updated records:

SELECT * FROM "checkouts"

{"__time":"2025-12-19T00:00:00.000Z","store_id":"store_a","count":4,"sum_item_count":6,"sum_price_total":300}
{"__time":"2025-12-19T00:00:00.000Z","store_id":"store_b","count":2,"sum_item_count":7,"sum_price_total":700}

This feels like a potential hazard which isn't explicitly called out in the documentation.

Ideally we would handle the delete markers and properly materialize the data, but thats a pretty big overhaul. As a more simple solution should we maybe just fail the ingestion if there's delete markers present in the target partitions?

Happy to help with the implementation here, or at least just updating the documentation to make this more clear - let me know what you think is the best path forwards.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions