Skip to content

Python API DuckDBPyRelation.arrow function #5959

@gratus907

Description

@gratus907

While using duckdb python api, I noticed that documentation on .arrow and .to_arrow_table is inaccurate.

v1.4 and v1.5(dev) documentation says:

arrow(self: _duckdb.DuckDBPyRelation, batch_size: typing.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader

Description
Execute and return an Arrow Record Batch Reader that yields all rows

Aliases: fetch_arrow_table, to_arrow_table

However, the function to_arrow_table documentation says

to_arrow_table
Signature
to_arrow_table(self: _duckdb.DuckDBPyRelation, batch_size: typing.SupportsInt = 1000000) -> pyarrow.lib.Table

Description
Execute and fetch all rows as an Arrow Table

Aliases: fetch_arrow_table, arrow

Between two functions, return types differ (pyarrow.lib.RecordBatchReader vs pyarrow.lib.Table), hence they should not be considered aliases. The documentation was correct up to v1.3 where .arrow also returned pyarrow.lib.Table.

It seems returning RecordBatchReader was discussed and intended:

If so, I think the documentation should be updated accordingly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions