Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
include =
pluggy/*
src/pluggy/*
pluggy/testing/*
testing/*
*/lib/python*/site-packages/pluggy/*
*/pypy*/site-packages/pluggy/*
Expand Down
3 changes: 3 additions & 0 deletions changelog/13750(pytest).bugfix.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
This change addresses an issue in pluggy that occured when running pytest with any pluggy tracing enabled when parametrized values contained surrogate escape characters.
Before, pluggy attempted to write trace messages using UTF-8 enconding, which fails for lone surrogates. Tracing now encodes lone surrogates with errors="replace" in order
to ensure that trace logging will not crash hook execution in the future.
3 changes: 2 additions & 1 deletion src/pluggy/_tracing.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,8 @@ def _format_message(self, tags: Sequence[str], args: Sequence[object]) -> str:

def _processmessage(self, tags: tuple[str, ...], args: tuple[object, ...]) -> None:
if self._writer is not None and args:
self._writer(self._format_message(tags, args))
msg = self._format_message(tags, args)
self._writer(msg.encode("utf-8", "replace").decode("utf-8"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about the perf impact of this. In pylint we added error handling instead of encoding / decoding each string (so we encode/decode only in case of unicoderror) because surrogate are rather rare (at least in an occidental context).

Maybe our intuition was wrong and this is the better approach, I'm ready to be surprised by a benchmark. Would you consider 1% of surrogate a proper approximation of reality for a benchmark ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this would impact performance, and honestly I would say that 1% percent is possibly too high of an approximation. We can benchmark this and see to what degree performance is affected.

try:
processor = self._tags2proc[tags]
except KeyError:
Expand Down
35 changes: 35 additions & 0 deletions testing/test_tracer.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,3 +75,38 @@ def test_setprocessor(rootlogger: TagTracer) -> None:
log2("seen")
tags, args = l2[0]
assert args == ("seen",)


def test_unicode_surrogate_handling(rootlogger: TagTracer) -> None:
out: list[str] = []
rootlogger.setwriter(out.append)
log = rootlogger.get("pytest")
s = "hello \ud800 world"
log(s)
assert len(out) == 1
assert "\ud800" not in out
assert "hello ? world" in out[0]


def test_unicode_surrogate_handling_2(rootlogger: TagTracer) -> None:
out: list[str] = []
rootlogger.setwriter(out.append)
log = rootlogger.get("pytest")

bad = b"\xed\xa0\x80".decode("utf-8", "surrogatepass")

log(bad)

assert len(out) == 1
assert "\ud800" not in out[0]
assert "?" in out[0]


def test_unicode_surrogate_handling_normal(rootlogger: TagTracer) -> None:
out: list[str] = []
rootlogger.setwriter(out.append)
log = rootlogger.get("pytest")
s = "hello world"
log(s)
assert len(out) == 1
assert "hello world" in out[0]