-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
bpo-30825: guess lineterminator in csv.Sniffer #2529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@vmax, thanks for your PR! By analyzing the history of the files in this pull request, we identified @rhettinger, @benjaminp and @birkenfeld to be potential reviewers. |
|
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. Thanks again to your contribution and we look forward to looking at it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patch needs tests.
Lib/csv.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every '\r\n' will be counted as '\r' and '\n' as well as '\r\n'. So I believe the '\r\n' count should be subtracted from the other 2 before comparing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@terryjreedy I changed detection method; most probably line terminators would be consistent so if one of them is present, it would be the one
|
To try and help move older pull requests forward, we are going through and backfilling 'awaiting' labels on pull requests that are lacking the label. Based on the current reviews, the best we can tell in an automated fashion is that a core developer requested changes to be made to this pull request. If/when the requested changes have been made, please leave a comment that says, |
|
I have made the requested changes; please review again |
|
Thanks for making the requested changes! @terryjreedy: please review the changes made to this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revise new test as suggested. Subject to more review, the code change looks good to me.
|
|
||
| for terminator in terminators: | ||
| if terminator in sample: | ||
| return terminator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the core of the patch. The sequence looks right.
| if terminator in sample: | ||
| return terminator | ||
|
|
||
| return os.linesep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case user submits a sample with an unusual or no line terminator.
Lib/test/test_csv.py
Outdated
| 'Tommy''s Place'+ Blue Island'+ 'IL'+ '12/28/02'+ 'Blue Sunday/White Crow' | ||
| 'Stonecutters ''Seafood'' and Chop House'+ 'Lemont'+ 'IL'+ '12/19/02'+ 'Week Back' | ||
| """ | ||
| sample10 = "Date;Value\r2010-01-01;10\r2010-01-01;20" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either move this line into the test function or put the literal into the sniff call. See comment below.
| dialect = sniffer.sniff(self.sample5) | ||
| self.assertEqual(dialect.lineterminator, '\r\n') | ||
| dialect = sniffer.sniff(self.sample8) | ||
| self.assertEqual(dialect.lineterminator, '\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that the line terminator for multiline triple-quoted string literals is '\n'. We require that '\r\n' in code files be converted to '\n' before merging, but the test should not assume '\n' for all developer and user systems where the test might be run. So sample8 should be replaced by a literal.
Since the sample strings need not be long, they can all be literals defined here. Make all 3 the same except for the terminator. This will make the test both crystal clear and independent of system and optional local settings.
| self.assertEqual(dialect.lineterminator, '\n') | ||
| dialect = sniffer.sniff(self.sample10) | ||
| self.assertEqual(dialect.lineterminator, '\r') | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add tests to cover the last line of _guess_lineterminator: return os.linesep. Add a test with the same literal, but with an oddball terminator, such as '\v' (vertical tab), and another with a single line with no control chars.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
|
If you have installed the blurb module (and you should if you think you might submit other PRs), please add a news file with the next revision. |
|
I have made the requested changes; please review again |
|
Thanks for making the requested changes! @terryjreedy: please review the changes made to this pull request. |
|
Closing and reopening to retrigger tests. |
|
Oops, I was not able to reopen after closing as this was on an 'unknown repository'. This pull request will need to be recreated as a new PR. |
https://bugs.python.org/issue30825