-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Avoid empty lines with spaces to be transformed to empty string #59155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 7 commits
619fced
6eeb058
0982da9
e5032f3
f2dee95
75e342a
461ebf0
ece2f7b
e9b8530
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1242,6 +1242,50 @@ def test_preserve_empty_rows(self, flavor_read_html): | |
|
||
tm.assert_frame_equal(result, expected) | ||
|
||
def test_preserve_rows_with_spaces(self, flavor_read_html): | ||
result = flavor_read_html( | ||
StringIO( | ||
""" | ||
<table> | ||
<tr> | ||
<th>A</th> | ||
<th>B</th> | ||
</tr> | ||
<tr> | ||
<td>a</td> | ||
<td>b</td> | ||
</tr> | ||
<tr> | ||
<td> </td> | ||
<td> </td> | ||
</tr> | ||
</table> | ||
""" | ||
), | ||
skip_blank_lines=False, | ||
)[0] | ||
expected = DataFrame(data=[["a", "b"], [" ", " "]], columns=["A", "B"]) | ||
|
||
tm.assert_frame_equal(result, expected) | ||
|
||
def test_preserve_table_with_just_spaces(self, flavor_read_html): | ||
result = flavor_read_html( | ||
StringIO( | ||
""" | ||
<table> | ||
<tr> | ||
<td> </td> | ||
</tr> | ||
</table> | ||
""" | ||
), | ||
skip_blank_lines=False, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this necessary? If it's left as the default then what will the result be? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Leaving it as default which is true would give the same behaviour as the bug, because if this is true, python_parser calls _remove_empty_lines which gets rid of lines with only spaces. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So running There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see, so removing the new argument to parse_html, I can instead pass the skip_blank_lines to be False by default to _parse function to always get space only lines as a row instead of avoiding them |
||
)[0] | ||
|
||
expected = DataFrame(data=[" "]) | ||
assert len(result) != 0 | ||
tm.assert_frame_equal(result, expected) | ||
|
||
def test_ignore_empty_rows_when_inferring_header(self, flavor_read_html): | ||
result = flavor_read_html( | ||
StringIO( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this relate to fixing the bug? Generally to introduce new parameters to a function you'd have to open an enhancement issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This argument is needed because this function might need to specify the option whether they want to consider columns with just spaces as a valid column or not. If this is true, any column which is blank would be skipped as before. Hereis the place where this is checked to remove whitespaced lines