-
Notifications
You must be signed in to change notification settings - Fork 137
Unicode encoding error #627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Why is codecov failing for us following this change in pluggy/src/pluggy/_tracing.py? Thanks! |
| if self._writer is not None and args: | ||
| self._writer(self._format_message(tags, args)) | ||
| msg = self._format_message(tags, args) | ||
| self._writer(msg.encode("utf-8", "replace").decode("utf-8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering about the perf impact of this. In pylint we added error handling instead of encoding / decoding each string (so we encode/decode only in case of unicoderror) because surrogate are rather rare (at least in an occidental context).
Maybe our intuition was wrong and this is the better approach, I'm ready to be surprised by a benchmark. Would you consider 1% of surrogate a proper approximation of reality for a benchmark ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this would impact performance, and honestly I would say that 1% percent is possibly too high of an approximation. We can benchmark this and see to what degree performance is affected.
|
Thank you for the PR @Ghanchu ! |
|
Before going forward with this https://github.com/pytest-dev/pytest/blob/main/src%2F_pytest%2Fhelpconfig.py#L129 the debigfile should perhsps open in surrogate escape mode |
… from test_tracer.py
We tried implementing the surrogate escape mode for the debug file, however it did not fix the original problem in the issue, nor the unicode surrogate tests in test_tracer.py. It only affects the debug file writer, not other writers (like lists, stdout, or internal pytest hooks). So if any other part of pytest writes a message containing a surrogate, you still get a UnicodeEncodeError. |
|
@RonnyPfannschmidt I don't understand, why surrogate escapes? Aren't those intended where you want to "mangle" invalid characters to put them somewhere, where later the same application (or a different one) can "unmangle" them again to get the originals back? This is a debug log, so the last thing that should be happening IMHO is "weird unicode things" (which, say, break a hook) being represented in an even more confusing way, no? That being said, I also don't think That all being said, why does |
|
Its using str because its the log message |
|
As for dealing with invalid surogates in general i prefer ignoring them everywhere but output encoding for files |
Huh? I mean here, where pluggy formats hook debug output (Python values!) into the string that should be written to the debug log: Line 38 in 5b24a4d
This results in a log file like this: pytest_plugin_registered [hook]
plugin: <_pytest.cacheprovider.LFPlugin object at 0x7f0740af6510>
plugin_name: lfplugin # !!
manager: <_pytest.config.PytestPluginManager object at 0x7f0741ac52b0>
finish pytest_plugin_registered --> [] [hook]or pytest_report_header [hook]
config: <_pytest.config.Config object at 0x7f0741ac7770>
start_path: /home/.../proj/pytest # !!
startdir: /home/.../proj/pytest # !!I'm arguing that this is the wrong thing to do for a debug log, and this should be pytest_plugin_registered [hook]
plugin: <_pytest.cacheprovider.LFPlugin object at 0x7f1864b90980>
plugin_name: 'lfplugin' # !!
manager: <_pytest.config.PytestPluginManager object at 0x7f18657370e0>or pytest_report_header [hook]
config: <_pytest.config.Config object at 0x7f1864efa270>
start_path: PosixPath('/home/.../proj/pytest') # !!
startdir: local('/home/.../proj/pytest') # !!That seems more appropriate for a debug output (now you can actually see that |
|
Ah, I spoke to soon: That alone doesn't fix this, but also using Line 504 in 5b24a4d
That means it gets logged as |
|
Good find Those repr placements are way more effective |
Fixing an issue with Pluggy where surrogate escape characters are not encoded correctly, causing a crash of Pluggy hooks. From issue 13750 in Pytest.