Skip to content

Commit 26dbbee

Browse files
committed
Enhance on_bad_lines
1 parent e72c8a1 commit 26dbbee

File tree

2 files changed

+5
-2
lines changed

2 files changed

+5
-2
lines changed

pandas/io/parsers/python_parser.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1201,7 +1201,7 @@ def _rows_to_cols(self, content: list[list[Scalar]]) -> list[np.ndarray]:
12011201

12021202
if actual_len > col_len:
12031203
if callable(self.on_bad_lines):
1204-
new_l = self.on_bad_lines(_content)
1204+
new_l = self.on_bad_lines(_content, col_len, actual_len, i + 2)
12051205
if new_l is not None:
12061206
content.append(new_l) # pyright: ignore[reportArgumentType]
12071207
elif self.on_bad_lines in (

pandas/io/parsers/readers.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -414,8 +414,11 @@ class _read_shared(TypedDict, Generic[HashableT], total=False):
414414
- ``'skip'``, skip bad lines without raising or warning when they are encountered.
415415
- Callable, function that will process a single bad line.
416416
- With ``engine='python'``, function with signature
417-
``(bad_line: list[str]) -> list[str] | None``.
417+
``(bad_line: list[str], expected_columns: int, actual_columns: int, row: int) -> list[str] | None``.
418418
``bad_line`` is a list of strings split by the ``sep``.
419+
``expected_columns`` is the expected number of columns.
420+
``actual_columns`` is the actual number of columns.
421+
``row`` is the row number of the bad line.
419422
If the function returns ``None``, the bad line will be ignored.
420423
If the function returns a new ``list`` of strings with more elements than
421424
expected, a ``ParserWarning`` will be emitted while dropping extra elements.

0 commit comments

Comments
 (0)