Always read python source using UTF-8 #59

MrMino · 2021-03-18T01:56:44Z

By default, Python 3 reads source in UTF-8, and so should pip-check-reqs.

This is an issue on one of my systems, where sys.getfilesystemencoding() returns "ascii". While I'm not sure why it does (it's a debian-slim docker image) - reading Python source in an encoding other than UTF-8 doesn't seem to make much sense. It's a very niche usecase, and if someone marks the source with non-standard encoding, it wouldn't work anyway. Let's just hardcode it.

adamtheturtle · 2021-03-20T22:50:45Z

This is not my area of expertise and I'm wary of breaking existing setups.

Could we have a test which would fail if this change did not exist?
Does this have any interaction / overlap with #20 ?

MrMino · 2021-04-02T21:53:12Z

Hi @adamtheturtle, sorry for the late response

Yes, this could use a testcase, I just didn't have the time for it when I initially created this PR :(.

This is similar to #20, but the angle of attack is different. #20 tries to provide a configuration option. This PR forces utf-8 as a default. IMO there should be logic to autodetect it, but it's something rather complex to implement.

Python source is read in utf-8 by default. The way it's read is configurable by using the markers specified in PEP-263.

This is different to how the text files are read. The default encoding of open() in the text mode is set to whatever the locale.getpreferredencoding() returns (source).

If the encoding specified by locale is utf-8, and the source is not marked as using non-UTF-8 encoding - everything is ok.

Reading the code, it looks like the following situations can potentially make pip-check-reqs fail:

❌ source is not marked (defaults to UTF-8), contains characters exclusive to UTF-8, but the preferred encoding is different than UTF-8 (let's say - in the case that prompted this PR - ASCII)
❌ source uses non-UTF-8 encoding that is different than the preferred encoding
❌ source is marked differently than UTF-8 and uses non-UTF-8 characters, preferred encoding is UTF-8

This PR solves the first case, which should be the most prolific source of errors:

✔️ source is not marked (defaults to UTF-8), contains characters exclusive to UTF-8, but the preferred encoding is different than UTF-8
❌ source uses non-UTF-8 encoding that is different than the preferred encoding
❌ source is marked differently than UTF-8 and uses non-UTF-8 characters, preferred encoding is UTF-8

#20 makes it possible to workaround these situations, but I don't think it's the way to go. Codebases can contain files with different encodings, and in that case setting a config option for it will make pip-check-reqs fail on different files depending on the config options used.

MrMino · 2021-04-04T04:04:50Z

@adamtheturtle I added a test, albeit a shallow one. It's impossible for me to change the default encoding without subprocessing python, and even then I'm not sure how to do it properly. It's a very low level setting which is difficult to change without making all sort of other things go haywire.

MrMino added 3 commits March 18, 2021 02:23

Always read python source using UTF-8

2fb5ac2

Fix FakeFile mock not having encoding= arg

b45a8b5

Bump version (2.2.1)

31bce1d

Add a test for encoding parameter

24bcc32

MrMino force-pushed the fix-encoding branch from 77f2d7b to 24bcc32 Compare April 4, 2021 04:02

Reword the CHANGLOG note

53bac9b

adamtheturtle merged commit d17aa5e into adamtheturtle:master Apr 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Always read python source using UTF-8 #59

Always read python source using UTF-8 #59

Uh oh!

MrMino commented Mar 18, 2021 •

edited

Loading

Uh oh!

adamtheturtle commented Mar 20, 2021

Uh oh!

MrMino commented Apr 2, 2021 •

edited

Loading

Uh oh!

MrMino commented Apr 4, 2021

Uh oh!

Uh oh!

Always read python source using UTF-8 #59

Always read python source using UTF-8 #59

Uh oh!

Conversation

MrMino commented Mar 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamtheturtle commented Mar 20, 2021

Uh oh!

MrMino commented Apr 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MrMino commented Apr 4, 2021

Uh oh!

Uh oh!

MrMino commented Mar 18, 2021 •

edited

Loading

MrMino commented Apr 2, 2021 •

edited

Loading