Skip to content

Inconsistencies between RecordBatch and DataFrame schemas cause to_arrow_table to fail #1314

@nuno-faria

Description

@nuno-faria

Describe the bug

When the nullability of a RecordBatch column does not match with the DataFrame's schema, the conversion to a pyarrow table fails.

To Reproduce

from datafusion import SessionContext

ctx = SessionContext()
ctx.sql("create table t_(a int not null)").collect()
ctx.sql("insert into t_ values (1), (2), (3)").collect()
ctx.sql(f"copy (select * from t_) to 't.parquet'").collect()
ctx.register_parquet("t", path)
pyarrow_table = ctx.sql("select max(a) as m from t").to_arrow_table()
...
pyarrow.lib.ArrowInvalid: Schema at index 0 was different: 
m: int32
vs
m: int32 not null

Expected behavior
Execute without crashing.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions