Skip to content

Conversation

@Ghanchu
Copy link

@Ghanchu Ghanchu commented Nov 19, 2025

Fixing an issue with Pluggy where surrogate escape characters are not encoded correctly, causing a crash of Pluggy hooks. From issue 13750 in Pytest.

@Ghanchu
Copy link
Author

Ghanchu commented Nov 19, 2025

Why is codecov failing for us following this change in pluggy/src/pluggy/_tracing.py? Thanks!

if self._writer is not None and args:
self._writer(self._format_message(tags, args))
msg = self._format_message(tags, args)
self._writer(msg.encode("utf-8", "replace").decode("utf-8"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about the perf impact of this. In pylint we added error handling instead of encoding / decoding each string (so we encode/decode only in case of unicoderror) because surrogate are rather rare (at least in an occidental context).

Maybe our intuition was wrong and this is the better approach, I'm ready to be surprised by a benchmark. Would you consider 1% of surrogate a proper approximation of reality for a benchmark ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this would impact performance, and honestly I would say that 1% percent is possibly too high of an approximation. We can benchmark this and see to what degree performance is affected.

@Pierre-Sassoulas
Copy link
Member

Thank you for the PR @Ghanchu !

@RonnyPfannschmidt
Copy link
Member

Before going forward with this

https://github.com/pytest-dev/pytest/blob/main/src%2F_pytest%2Fhelpconfig.py#L129 the debigfile should perhsps open in surrogate escape mode

@Ghanchu
Copy link
Author

Ghanchu commented Nov 19, 2025

Before going forward with this

https://github.com/pytest-dev/pytest/blob/main/src%2F_pytest%2Fhelpconfig.py#L129 the debigfile should perhsps open in surrogate escape mode

We tried implementing the surrogate escape mode for the debug file, however it did not fix the original problem in the issue, nor the unicode surrogate tests in test_tracer.py. It only affects the debug file writer, not other writers (like lists, stdout, or internal pytest hooks). So if any other part of pytest writes a message containing a surrogate, you still get a UnicodeEncodeError.

@The-Compiler
Copy link
Member

@RonnyPfannschmidt I don't understand, why surrogate escapes? Aren't those intended where you want to "mangle" invalid characters to put them somewhere, where later the same application (or a different one) can "unmangle" them again to get the originals back?

This is a debug log, so the last thing that should be happening IMHO is "weird unicode things" (which, say, break a hook) being represented in an even more confusing way, no?

That being said, I also don't think replace is a good idea here, I think the most (only?) sensible thing for a debug things is using backslashescape.

That all being said, why does _format_message use str for formatting the hook (?) arguments in the first place? Wouldn't be using repr() way more appropriate for a debug log, and fix that problem too?

@RonnyPfannschmidt
Copy link
Member

Its using str because its the log message
Indeed backslash escape is a better choice

@RonnyPfannschmidt
Copy link
Member

As for dealing with invalid surogates in general i prefer ignoring them everywhere but output encoding for files

@The-Compiler
Copy link
Member

The-Compiler commented Nov 21, 2025

Its using str because its the log message

Huh? I mean here, where pluggy formats hook debug output (Python values!) into the string that should be written to the debug log:

lines.append(f"{indent} {name}: {value}\n")

This results in a log file like this:

      pytest_plugin_registered [hook]
          plugin: <_pytest.cacheprovider.LFPlugin object at 0x7f0740af6510>
          plugin_name: lfplugin  # !!
          manager: <_pytest.config.PytestPluginManager object at 0x7f0741ac52b0>
      finish pytest_plugin_registered --> [] [hook]

or

      pytest_report_header [hook]
          config: <_pytest.config.Config object at 0x7f0741ac7770>
          start_path: /home/.../proj/pytest  # !!
          startdir: /home/.../proj/pytest  # !!

I'm arguing that this is the wrong thing to do for a debug log, and this should be {value!r} instead:

      pytest_plugin_registered [hook]
          plugin: <_pytest.cacheprovider.LFPlugin object at 0x7f1864b90980>
          plugin_name: 'lfplugin'  # !!
          manager: <_pytest.config.PytestPluginManager object at 0x7f18657370e0>

or

      pytest_report_header [hook]
          config: <_pytest.config.Config object at 0x7f1864efa270>
          start_path: PosixPath('/home/.../proj/pytest')  # !!
          startdir: local('/home/.../proj/pytest')  # !!

That seems more appropriate for a debug output (now you can actually see that start_path and startdir aren't the same type as plugin_name in the other message!), and would fix this issue as well.

@The-Compiler
Copy link
Member

Ah, I spoke to soon: That alone doesn't fix this, but also using repr(...) here for the return value does:

hooktrace("finish", hook_name, "-->", outcome.get_result())

That means it gets logged as finish pytest_fixture_setup --> '\ud800' [hook] rather than trying to put a literal unicode character (mangled or not) into the debug log file.

@RonnyPfannschmidt
Copy link
Member

Good find

Those repr placements are way more effective

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants