[perf] Make PrettyPrinter format lazily so output can be budget-capped#14588
[perf] Make PrettyPrinter format lazily so output can be budget-capped#14588Pierre-Sassoulas wants to merge 3 commits into
PrettyPrinter format lazily so output can be budget-capped#14588Conversation
35ed5b2 to
d4b901c
Compare
133da41 to
f4bd109
Compare
bluetech
left a comment
There was a problem hiding this comment.
Thanks, seems like a nice optimization to me, and I'd say makes the code nicer as well.
| ) -> list[str]: | ||
| """Pretty-print ``object`` and return its lines. | ||
|
|
||
| ``_format`` yields the output as a stream of chunks, so this can |
There was a problem hiding this comment.
Probably a "public" method shouldn't reference a private method in its docstring. I would describe the behavior rather than the implementation here.
| unbounded. With both ``None`` the whole object is formatted. The | ||
| budget is a stopping condition, not a precise cut: formatting | ||
| stops on the first chunk that reaches it, so the result may | ||
| slightly overshoot (the caller truncates to the exact limit). |
There was a problem hiding this comment.
Would remove the detail of what the caller does, or replace "truncates" with "can truncate" or "should truncate".
|
|
||
| def pformat_lines( | ||
| self, | ||
| object: Any, |
There was a problem hiding this comment.
I suggest enforcing max_lines and max_chars are kw-only. Otherwise they can be easily confused.
There was a problem hiding this comment.
Yes, definitely.
| self._format_items(object, stream, indent, allowance, context, level) | ||
| stream.write(endchar) | ||
| try: | ||
| object = sorted(object) |
There was a problem hiding this comment.
This is an unrelated optimization, or is it somehow related to the iterator change?
If it's an optimization, a comment would be helpful, something like: "Try direct sort first, faster than the fallback.".
There was a problem hiding this comment.
Yes sorry I didn't realize this was in this commit. It's indeed an optimization (rather big one imho). Because it uses C sort directly it's a lot faster, and also I suppose that we rarely have heterogenous data structure. (Then it's a little longer of course). Here's a micro-benchmark:
Details
"""Interleaved benchmark of the set-sort fast path in ``_pprint_set``.
Compares ``sorted(s)`` against ``sorted(s, key=_safe_key)`` (the prior
behaviour), and the heterogeneous fallback ``try sorted / except retry``
against always using ``_safe_key``.
Interleaving: A and B are timed back-to-back every iteration, so slow
drift (CPU frequency scaling, thermal, GC) hits both equally and cancels
in the per-iteration difference. We report the median per-call time of
each and the median of the *paired* differences (B - A), which is the
robust estimate of the real gap.
"""
import statistics
import time
from _pytest._io.pprint import _safe_key
def bench_pair(a, b, inner, outer):
"""Interleave-time ``a`` and ``b``; return per-call medians + paired diff."""
ta_samples, tb_samples, diffs = [], [], []
for _ in range(outer):
t = time.perf_counter()
for _ in range(inner):
a()
ta = (time.perf_counter() - t) / inner
t = time.perf_counter()
for _ in range(inner):
b()
tb = (time.perf_counter() - t) / inner
ta_samples.append(ta)
tb_samples.append(tb)
diffs.append(tb - ta)
return (
statistics.median(ta_samples) * 1000, # ms
statistics.median(tb_samples) * 1000, # ms
statistics.median(diffs) * 1e6, # us, paired
)
print("plain sorted (A) vs sorted(key=_safe_key) (B), homogeneous:")
for label, data, inner, outer in [
("int set 1k", set(range(1000)), 100, 300),
("int set 100k", set(range(100_000)), 3, 80),
("str set 100k", {f"item-{i}" for i in range(100_000)}, 1, 80),
]:
a = lambda d=data: sorted(d)
b = lambda d=data: sorted(d, key=_safe_key)
ma, mb, diff = bench_pair(a, b, inner, outer)
print(
f" {label:13} A={ma:9.4f} ms B={mb:9.4f} ms B/A={mb / ma:5.1f}x"
f" paired B-A={diff:+10.1f} us"
)
print("\nheterogeneous (unorderable mix), fallback path:")
het = {1, "a", 2, "b", 3.5, None} | {f"x{i}" for i in range(500)} | set(range(500))
try:
sorted(het)
raise SystemExit("expected TypeError - set is not heterogeneous")
except TypeError:
pass
def safe_sort(d=het):
return sorted(d, key=_safe_key)
def failed_sort(d=het):
# the *only* extra work the try/except adds: a plain sort that raises
try:
sorted(d)
except TypeError:
pass
# Measure the overhead directly: the failed sort is the whole cost of the
# fallback's try/except. Comparing new(=failed+safe) vs old(=safe) is
# useless here — the ~us signal is far below the jitter of the ~ms safe
# sort, so the paired diff is noise (can even come out negative).
overhead_ms, safe_ms, _ = bench_pair(failed_sort, safe_sort, 20, 300)
print(
f" {'hetero 1k':13} _safe_key sort={safe_ms:8.4f} ms"
f" try/except overhead (failed sort)={overhead_ms * 1000:6.1f} us"
f" = {overhead_ms / safe_ms * 100:.2f}% of the sort"
)Result on my machine:
plain sorted (A) vs sorted(key=_safe_key) (B), homogeneous:
int set 1k A= 0.0153 ms B= 0.4661 ms B/A= 30.4x paired B-A= +451.4 us
int set 100k A= 1.8300 ms B= 63.6966 ms B/A= 34.8x paired B-A= +61787.4 us
str set 100k A= 54.3415 ms B= 356.0647 ms B/A= 6.6x paired B-A= +302171.5 us
heterogeneous (unorderable mix), fallback path:
hetero 1k _safe_key sort= 4.1314 ms try/except overhead (failed sort)= 12.2 us = 0.30% of the sort
| pp = PrettyPrinter() | ||
| assert pp.pformat({3, 1, 2}) == "{\n 1,\n 2,\n 3,\n}" | ||
| # Mixed unorderable types must not raise. | ||
| pp.pformat({1, "a", 2, "b"}) |
There was a problem hiding this comment.
Might as well assert it as well.
dbd78b3 to
1786e34
Compare
…apped
``_format`` and the per-type helpers now ``yield`` their output as a
stream of string chunks instead of writing to a file-like object, and
``pformat`` joins them. On top of that, ``pformat_lines`` pulls from the
formatter only until a budget is reached:
pformat_lines(obj, max_lines=None, max_chars=None)
It stops on the first chunk that reaches *either* budget, so a huge
collection costs O(budget) rather than O(N). Either dimension may be
``None`` (unbounded); with both ``None`` the whole object is formatted.
Motivation
----------
Assertion diffs are truncated to a handful of lines/chars before being
shown. Formatting the whole of a large ``==`` comparison and then
throwing almost all of it away is pure waste. With a lazy formatter the
truncating caller simply stops pulling once it has enough.
Benchmark (``PrettyPrinter`` alone, width 80)::
list(range(500_000)):
pformat().splitlines() ~805 ms
pformat_lines(max_lines=11) ~0.027 ms (~30000x)
[8 small ints] (common small diff):
pformat().splitlines() ~0.0133 ms
pformat_lines(max_lines=11) ~0.0185 ms (+~5 us)
["x"*100_000] * 3 (flat, few huge elements):
pformat_lines(max_chars=640) stops after ~100_000 chars
(one element) instead of 300_000
Why a lazy generator rather than a fast path + budget stream
------------------------------------------------------------
An earlier approach kept a cheap ``pformat().splitlines()`` fast path
guarded by ``len(obj) <= max_lines`` plus a flatness check, falling back
to a write-intercepting budget-stream class for the rest. Two problems:
* ``len(obj)`` is only a *lower* bound on the line count — one nested
element (``[{...50 keys...}]``) expands to many lines — so the guard
needed the flatness scan to stay correct, and even then it bounded
only *lines*, never *chars*: a flat container of a few enormous
strings has almost no lines but blows the char budget.
* it was two code paths plus a stream class plus an exception used for
control flow.
Because the formatter is lazy, "stop pulling at the budget" is the whole
optimisation: correct regardless of how lines/chars are distributed
across elements, bounding both dimensions, with no ``len()`` proxy to
get wrong and no fast/slow branch. The common small-diff case costs only
~5 us more than the unbounded path (it is never the bottleneck — a
failing assertion isn't hot), while large comparisons drop by orders of
magnitude.
``_pprint_set``/``_pprint_dict`` also try a plain ``sorted`` first and
fall back to the ``_safe_key`` wrapper only for unorderable mixes.
This diverges structurally from the upstream cpython ``pprint`` it was
vendored from; the module header notes it is no longer kept in sync.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
In ``pformat_lines``'s budget loop, ``chunk.count("\n")`` ran on every
chunk, but most chunks (brackets, indentation, item reprs) contain no
newline. Guarding the call with ``"\n" in chunk`` skips it on those and
recovers part of the per-chunk budget-tracking overhead: formatting an
8-element list under a budget drops from ~0.0185 ms to ~0.0163 ms
(versus ~0.0132 ms for an uncapped ``pformat().splitlines()``, so the
budget overhead roughly halves, from ~+5 us to ~+3 us).
The win is small and only matters on the ``-v`` truncating path of a
failing assertion (the default path doesn't format the diff at all), so
this is kept as a separate commit — easy to drop if the extra branch
isn't judged worth it.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Addresses review on pytest-dev#14588: * make ``max_lines`` / ``max_chars`` keyword-only so they can't be confused at the call site. * drop the implementation detail (``_format``) and the "what the caller does" note from the docstring; describe the behaviour instead. * comment the set-sort fast path ("try a direct sort first, faster than the fallback"). * assert the heterogeneous-set output in the test rather than only checking it does not raise. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1786e34 to
abf4962
Compare
Refactor required prior to #14523.
_formatand the per-type helpers nowyieldtheir output as a stream of string chunks instead of writing to a file-like object, andpformatjoins them. On top of that,pformat_linespulls from the formatter only until a budget is reached:It stops on the first chunk that reaches either budget, so a huge collection costs O(budget) rather than O(N). Either dimension may be
None(unbounded); with bothNonethe whole object is formatted.Benchmark (
PrettyPrinteralone, width 80)::