Skip to content

ENH: Include line number and number of fields when read_csv() callable with engine="python" raises ParserWarning #61974

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

sanggon6107
Copy link
Contributor

Description of the change

read_csv() currently provides the description of an invalid row(expected_columns, actual_columns, number, text) when the row has too many elements where engine="pyarrow", but the callable can only include the contents of the row when engine="python".

(For more details on pyarrow.csv.InvalidRow, see pyarrow documentation)

This PR proposes to additionally pass expected_columns, actual_columns and row when on_bad_lines is a callable and engine="python", so that users can desribe the invalid row more in detail.

The order of the arguments has been aligned with pyarrow.

@mroeschke
Copy link
Member

Thanks for the PR, but this enhancement needs more discussion before moving forward with a PR. Additionally this approach.

  1. Is an API breaking change for user pass the older form of the callable
  2. You callable description doesn't seem to match PyArrow from the example in https://arrow.apache.org/docs/python/generated/pyarrow.csv.ParseOptions.html#pyarrow.csv.ParseOptions

so closing

@mroeschke mroeschke closed this Jul 28, 2025
@sanggon6107
Copy link
Contributor Author

Many thanks @mroeschke ,

  1. Is an API breaking change for user pass the older form of the callable

Understood. Maybe there could be some further discussions regarding this in the near future considering there are some suggestions at #61978 .

  1. You callable description doesn't seem to match PyArrow from the example in https://arrow.apache.org/docs/python/generated/pyarrow.csv.ParseOptions.html#pyarrow.csv.ParseOptions

I've meant the callable has been aligned with pyarrow.csv.InvalidRow, but as you mentioned, this also needs to be considered in terms of backwards compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Include line number and number of fields when read_csv() callable raises ParserWarning
2 participants