-
Notifications
You must be signed in to change notification settings - Fork 134
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Some API's feel a bit un-intuitive, I think Polars has really excelled at this area. My suggestion is we re-use some of those APIs or take some inspiration of them, changes I am proposing (I am happy to work on these areas especially with datafusion-ray becoming a thing):
- -
DataFrame.cache() -> DataFrame===>DataFrame.collect() -> DataFrame - -
DataFrame.collect() -> list[pyarrow.RecordBatch]===>DataFrame.to_batches() -> list[pyarrow.RecordBatch] - -
DataFrame.join===>DataFrame.join(right: DataFrame, on: str | sequence[str] | None, left_on: str | sequence[str] | None, right_on: str | sequence[str] | None - -
DataFrame.schema -> pyarrow.Schema===>DataFrame.schema -> datafusion.SchemaMap Rust arrow types to dafusion-py types - -
DataFrame.with_column===>DataFrame.with_columnsAllow multiple inputs as exprs or key value pairs - -
DataFrame.with_column_renamed===>DataFrame.rename()a simple rename is clear enough and should allow a dict as input - -
DataFrame.aggregate===>DataFrame.group_by().agg()this feels more natural coming from PySpark/Polars/Pandas
Can remove these:
- -
DataFrame.select_columnsalready covered byDataFrame.select
Missing APIs:
- -
DataFrame.castto cast on top level a single or multiple columns - -
DataFrame.dropto drop columns, instead of writing a very verbose select - -
DataFrame.fill_null/fill_nanto fill null or nan values - -
DataFrame.interpolateinterpolate values per col - - Asof join missing in df api?
- - Join on (inequality join)
- -
DataFrame.head/tail - -
DataFrame.pivot - -
DataFrame.unpivot
Optional but useful:
- -
DataFrame.with_row_idx
emgeee
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request