feat: Improve `read_csv` compatibility across backends

### Is your feature request related to a problem?

I'm trying to use `read_csv` in multiple backends at the same time, but unfortunately the options do not match across backends, here is the compatibility table for [flights.csv](https://duckdb.org/data/flights.csv):

<table>
<thead>
<td>Backend</td>
<td>Separator option</td>
<td>Header option</td>
<td>Schema option</td>
<td>Example</td>
</thead>
<tbody>

<tr>
<td><a href="https://duckdb.org/docs/stable/data/csv/overview.html#parameters">duckdb</a></td>
<td>

`sep`

</td>
<td>

`header`

</td>
<td>

`columns`

</td>
<td>

```bash
uv run --with ibis-framework[duckdb] python
```
```python
import ibis
con = ibis.connect('duckdb://')
con.read_csv('flights.csv', sep='|', header=True, columns={
    'FlightDate': 'DATE',
    'UniqueCarrier': 'VARCHAR',
    'OriginCityName': 'VARCHAR',
    'DestCityName': 'VARCHAR'
})
```
```
DatabaseTable: ibis_read_csv_6zrnj6cuujhoxmdw5odszvpwxe
  FlightDate     date
  UniqueCarrier  string
  OriginCityName string
  DestCityName   string
```

</td>
</tr>

<tr>
<td><a href="https://docs.pola.rs/api/python/stable/reference/api/polars.read_csv.html">polars</a></td>
<td>

`separator`

</td>
<td>

`has_header`

</td>
<td>

`schema`

</td>
<td>

```bash
uv run --with ibis-framework[polars] python
```
```python
import ibis
import polars as pl
con = ibis.connect('polars://')
con.read_csv('flights.csv', separator='|', has_header=True, schema={
    'FlightDate': pl.Date,
    'UniqueCarrier': pl.String,
    'OriginCityName': pl.String,
    'DestCityName': pl.String,
})
```
```
DatabaseTable: ibis_read_csv_dnt5itr3nremxdn5hr6zsr55xa
  FlightDate     date
  UniqueCarrier  string
  OriginCityName string
  DestCityName   string
```

</td>
</tr>

<tr>
<td><a href="https://datafusion.apache.org/python/autoapi/datafusion/context/index.html#datafusion.context.SessionContext.read_csv">datafusion</a></td>
<td>

`delimiter`

</td>
<td>

`has_header`

</td>
<td>

`schema`

</td>
<td>

```bash
uv run --with ibis-framework[datafusion] python
```
```python
import ibis
import pyarrow as pa
con = ibis.connect('datafusion://')
con.read_csv('flights.csv', delimiter='|', has_header=True, schema=pa.StructType([
    pa.StructField('FlightDate', pa.DateType()),
    pa.StructField('UniqueCarrier', pa.StringType()),
    pa.StructField('OriginCityName', pa.StringType()),
    pa.StructField('DestCityName', pa.StringType()),
]))
```
```
DatabaseTable: ibis_read_csv_tepbqv667jainbegezjvgutycy
  FlightDate     date
  UniqueCarrier  string
  OriginCityName string
  DestCityName   string
```

</td>
</tr>

<tr>
<td><a href="https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv">pyspark</a></td>
<td>

`sep`

</td>
<td>

`header`

</td>
<td>

`schema`

</td>
<td>

```bash
uv run --with ibis-framework[pyspark] python
```
```python
import ibis
from pyspark.sql.types import DateType, StringType, StructType, StructField
con = ibis.connect('pyspark://')
con.read_csv('flights.csv', sep='|', header=True, schema=StructType([
    StructField('FlightDate', DateType()),
    StructField('UniqueCarrier', StringType()),
    StructField('OriginCityName', StringType()),
    StructField('DestCityName', StringType()),
]))
```
```
DatabaseTable: ibis_read_csv_tepbqv667jainbegezjvgutycy
  FlightDate     date
  UniqueCarrier  string
  OriginCityName string
  DestCityName   string
```

</td>
</tr>
</tbody>
</table>

### What is the motivation behind your request?

_No response_

### Describe the solution you'd like

Since Ibis claims to provide unified API for all backends, I suggest to improve
the compatibility and properly handle the following options for
`ibis.expr.api.read_csv`:

<table>
<thead>
    <tr>
        <td>name</td>
        <td>type</td>
        <td>description</td>
        <td>naming</td>
    </tr>
</thead>
<tbody>
    <tr>
        <td>separator</td>
        <td>string</td>
        <td>Single byte character to use as separator in the file.</td>
        <td>

Short names like `sep` might be confusing, no need to save letters. `separator` seems to be more common than `delimiter`.

</td>
    </tr>
    <tr>
        <td>has_header</td>
        <td>bool</td>
        <td>Indicate if the first row of the dataset is a header or not.</td>
        <td>

`has_header` is a better name than `header` because it clearly states the bool type.

</td>
    </tr>
    <tr>
        <td>schema</td>
        <td>ibis.Struct</td>
        <td>An optional schema representing the CSV files. If None, the CSV reader will try to infer it based on data in file.</td>
        <td>

`schema` seems to be more clear and common than `columns`

</td>
    </tr>
</tbody>
</table> 


### What version of ibis are you running?

10.6.0

### What backend(s) are you using, if any?

DuckDB, Polars, DataFusion, PySpark

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Backend	Separator option	Header option	Schema option	Example
duckdb	`sep`	`header`	`columns`	uv run --with ibis-framework[duckdb] python import ibis con = ibis.connect('duckdb://') con.read_csv('flights.csv', sep='\|', header=True, columns={ 'FlightDate': 'DATE', 'UniqueCarrier': 'VARCHAR', 'OriginCityName': 'VARCHAR', 'DestCityName': 'VARCHAR' }) `DatabaseTable: ibis_read_csv_6zrnj6cuujhoxmdw5odszvpwxe FlightDate date UniqueCarrier string OriginCityName string DestCityName string`
polars	`separator`	`has_header`	`schema`	uv run --with ibis-framework[polars] python import ibis import polars as pl con = ibis.connect('polars://') con.read_csv('flights.csv', separator='\|', has_header=True, schema={ 'FlightDate': pl.Date, 'UniqueCarrier': pl.String, 'OriginCityName': pl.String, 'DestCityName': pl.String, }) `DatabaseTable: ibis_read_csv_dnt5itr3nremxdn5hr6zsr55xa FlightDate date UniqueCarrier string OriginCityName string DestCityName string`
datafusion	`delimiter`	`has_header`	`schema`	uv run --with ibis-framework[datafusion] python import ibis import pyarrow as pa con = ibis.connect('datafusion://') con.read_csv('flights.csv', delimiter='\|', has_header=True, schema=pa.StructType([ pa.StructField('FlightDate', pa.DateType()), pa.StructField('UniqueCarrier', pa.StringType()), pa.StructField('OriginCityName', pa.StringType()), pa.StructField('DestCityName', pa.StringType()), ])) `DatabaseTable: ibis_read_csv_tepbqv667jainbegezjvgutycy FlightDate date UniqueCarrier string OriginCityName string DestCityName string`
pyspark	`sep`	`header`	`schema`	uv run --with ibis-framework[pyspark] python import ibis from pyspark.sql.types import DateType, StringType, StructType, StructField con = ibis.connect('pyspark://') con.read_csv('flights.csv', sep='\|', header=True, schema=StructType([ StructField('FlightDate', DateType()), StructField('UniqueCarrier', StringType()), StructField('OriginCityName', StringType()), StructField('DestCityName', StringType()), ])) `DatabaseTable: ibis_read_csv_tepbqv667jainbegezjvgutycy FlightDate date UniqueCarrier string OriginCityName string DestCityName string`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Improve `read_csv` compatibility across backends #11459

Is your feature request related to a problem?

What is the motivation behind your request?

Describe the solution you'd like

What version of ibis are you running?

What backend(s) are you using, if any?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

name	type	description	naming
separator	string	Single byte character to use as separator in the file.	Short names like `sep` might be confusing, no need to save letters. `separator` seems to be more common than `delimiter`.
has_header	bool	Indicate if the first row of the dataset is a header or not.	`has_header` is a better name than `header` because it clearly states the bool type.
schema	ibis.Struct	An optional schema representing the CSV files. If None, the CSV reader will try to infer it based on data in file.	`schema` seems to be more clear and common than `columns`

feat: Improve read_csv compatibility across backends #11459

Description

Is your feature request related to a problem?

What is the motivation behind your request?

Describe the solution you'd like

What version of ibis are you running?

What backend(s) are you using, if any?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

feat: Improve `read_csv` compatibility across backends #11459