-
Notifications
You must be signed in to change notification settings - Fork 476
Open
Description
While using duckdb python api, I noticed that documentation on .arrow and .to_arrow_table is inaccurate.
v1.4 and v1.5(dev) documentation says:
arrow(self: _duckdb.DuckDBPyRelation, batch_size: typing.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader
Description
Execute and return an Arrow Record Batch Reader that yields all rows
Aliases: fetch_arrow_table, to_arrow_table
However, the function to_arrow_table documentation says
to_arrow_table
Signature
to_arrow_table(self: _duckdb.DuckDBPyRelation, batch_size: typing.SupportsInt = 1000000) -> pyarrow.lib.Table
Description
Execute and fetch all rows as an Arrow Table
Aliases: fetch_arrow_table, arrow
Between two functions, return types differ (pyarrow.lib.RecordBatchReader vs pyarrow.lib.Table), hence they should not be considered aliases. The documentation was correct up to v1.3 where .arrow also returned pyarrow.lib.Table.
It seems returning RecordBatchReader was discussed and intended:
- Make .arrow() from relation return record batch reader duckdb-python#32
- Change arrow() to export record batch reader duckdb#18642
If so, I think the documentation should be updated accordingly.
Metadata
Metadata
Assignees
Labels
No labels