-
Notifications
You must be signed in to change notification settings - Fork 662
Open
Labels
featureFeatures or general enhancementsFeatures or general enhancements
Description
Is your feature request related to a problem?
I'm trying to use read_csv
in multiple backends at the same time, but unfortunately the options do not match across backends, here is the compatibility table for flights.csv:
Backend | Separator option | Header option | Schema option | Example |
duckdb |
|
|
|
uv run --with ibis-framework[duckdb] python import ibis
con = ibis.connect('duckdb://')
con.read_csv('flights.csv', sep='|', header=True, columns={
'FlightDate': 'DATE',
'UniqueCarrier': 'VARCHAR',
'OriginCityName': 'VARCHAR',
'DestCityName': 'VARCHAR'
})
|
polars |
|
|
|
uv run --with ibis-framework[polars] python import ibis
import polars as pl
con = ibis.connect('polars://')
con.read_csv('flights.csv', separator='|', has_header=True, schema={
'FlightDate': pl.Date,
'UniqueCarrier': pl.String,
'OriginCityName': pl.String,
'DestCityName': pl.String,
})
|
datafusion |
|
|
|
uv run --with ibis-framework[datafusion] python import ibis
import pyarrow as pa
con = ibis.connect('datafusion://')
con.read_csv('flights.csv', delimiter='|', has_header=True, schema=pa.StructType([
pa.StructField('FlightDate', pa.DateType()),
pa.StructField('UniqueCarrier', pa.StringType()),
pa.StructField('OriginCityName', pa.StringType()),
pa.StructField('DestCityName', pa.StringType()),
]))
|
pyspark |
|
|
|
uv run --with ibis-framework[pyspark] python import ibis
from pyspark.sql.types import DateType, StringType, StructType, StructField
con = ibis.connect('pyspark://')
con.read_csv('flights.csv', sep='|', header=True, schema=StructType([
StructField('FlightDate', DateType()),
StructField('UniqueCarrier', StringType()),
StructField('OriginCityName', StringType()),
StructField('DestCityName', StringType()),
]))
|
What is the motivation behind your request?
No response
Describe the solution you'd like
Since Ibis claims to provide unified API for all backends, I suggest to improve
the compatibility and properly handle the following options for
ibis.expr.api.read_csv
:
name | type | description | naming |
separator | string | Single byte character to use as separator in the file. |
Short names like |
has_header | bool | Indicate if the first row of the dataset is a header or not. |
|
schema | ibis.Struct | An optional schema representing the CSV files. If None, the CSV reader will try to infer it based on data in file. |
|
What version of ibis are you running?
10.6.0
What backend(s) are you using, if any?
DuckDB, Polars, DataFusion, PySpark
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
featureFeatures or general enhancementsFeatures or general enhancements
Type
Projects
Status
backlog