Skip to content

Iceberg's parquets do not have field_id (schema evolution is broken) #10394

@4ertus2

Description

@4ertus2

Backend

VL (Velox)

Bug description

Iceberg spec requires field_ids are set:

Column IDs are required to be stored as field IDs on the parquet schema.

As I could understand the actual column ids from Iceberg schema are not passed here. So they cannot be written in Velox later.

return std::make_shared<IcebergWriter>(

It looks like it would be possible to pass the ids in Velox part through IcebergColumnHandle after this PR

Am I right that there's no info about actual Iceberg column_ids in Java_org_apache_gluten_execution_IcebergWriteJniWrapper_init right now?

Gluten version

main branch

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions