Skip to content

Conversation

sanggon6107
Copy link
Contributor

Description of the change

read_csv() currently provides the description of an invalid row(expected_columns, actual_columns, number, text) when the row has too many elements where engine="pyarrow", but the callable can only include the contents of the row when engine="python".

(For more details on pyarrow.csv.InvalidRow, see pyarrow documentation)

This PR proposes to additionally pass expected_columns, actual_columns and row when on_bad_lines is a callable and engine="python", so that users can desribe the invalid row more in detail.

The order of the arguments has been aligned with pyarrow.

@mroeschke
Copy link
Member

Thanks for the PR, but this enhancement needs more discussion before moving forward with a PR. Additionally this approach.

  1. Is an API breaking change for user pass the older form of the callable
  2. You callable description doesn't seem to match PyArrow from the example in https://arrow.apache.org/docs/python/generated/pyarrow.csv.ParseOptions.html#pyarrow.csv.ParseOptions

so closing

@mroeschke mroeschke closed this Jul 28, 2025
@sanggon6107
Copy link
Contributor Author

Many thanks @mroeschke ,

  1. Is an API breaking change for user pass the older form of the callable

Understood. Maybe there could be some further discussions regarding this in the near future considering there are some suggestions at #61978 .

  1. You callable description doesn't seem to match PyArrow from the example in https://arrow.apache.org/docs/python/generated/pyarrow.csv.ParseOptions.html#pyarrow.csv.ParseOptions

I've meant the callable has been aligned with pyarrow.csv.InvalidRow, but as you mentioned, this also needs to be considered in terms of backwards compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: Include line number and number of fields when read_csv() callable raises ParserWarning

2 participants