Skip to content

[perf] Make PrettyPrinter format lazily so output can be budget-capped#14588

Open
Pierre-Sassoulas wants to merge 3 commits into
pytest-dev:mainfrom
Pierre-Sassoulas:pprint-lazy-budget
Open

[perf] Make PrettyPrinter format lazily so output can be budget-capped#14588
Pierre-Sassoulas wants to merge 3 commits into
pytest-dev:mainfrom
Pierre-Sassoulas:pprint-lazy-budget

Conversation

@Pierre-Sassoulas

@Pierre-Sassoulas Pierre-Sassoulas commented Jun 13, 2026

Copy link
Copy Markdown
Member

Refactor required prior to #14523.

_format and the per-type helpers now yield their output as a stream of string chunks instead of writing to a file-like object, and pformat joins them. On top of that, pformat_lines pulls from the formatter only until a budget is reached:

pformat_lines(obj, max_lines=None, max_chars=None)

It stops on the first chunk that reaches either budget, so a huge collection costs O(budget) rather than O(N). Either dimension may be None (unbounded); with both None the whole object is formatted.

Benchmark (PrettyPrinter alone, width 80)::

list(range(500_000)):
    pformat().splitlines()        ~805 ms
    pformat_lines(max_lines=11)   ~0.027 ms      (~30000x)

[8 small ints] (common small diff):
    pformat().splitlines()        ~0.0133 ms
    pformat_lines(max_lines=11)   ~0.0163 ms (+3µs)

["x"*100_000] * 3 (flat, few huge elements):
    pformat_lines(max_chars=640)  stops after ~100_000 chars
                                  (one element) instead of 300_000

@Pierre-Sassoulas Pierre-Sassoulas added the skip news used on prs to opt out of the changelog requirement label Jun 13, 2026
@Pierre-Sassoulas Pierre-Sassoulas marked this pull request as draft June 13, 2026 16:30
@Pierre-Sassoulas Pierre-Sassoulas force-pushed the pprint-lazy-budget branch 2 times, most recently from 133da41 to f4bd109 Compare June 13, 2026 17:12
@Pierre-Sassoulas Pierre-Sassoulas marked this pull request as ready for review June 14, 2026 05:30

@bluetech bluetech left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, seems like a nice optimization to me, and I'd say makes the code nicer as well.

Comment thread src/_pytest/_io/pprint.py Outdated
) -> list[str]:
"""Pretty-print ``object`` and return its lines.

``_format`` yields the output as a stream of chunks, so this can

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a "public" method shouldn't reference a private method in its docstring. I would describe the behavior rather than the implementation here.

Comment thread src/_pytest/_io/pprint.py Outdated
unbounded. With both ``None`` the whole object is formatted. The
budget is a stopping condition, not a precise cut: formatting
stops on the first chunk that reaches it, so the result may
slightly overshoot (the caller truncates to the exact limit).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would remove the detail of what the caller does, or replace "truncates" with "can truncate" or "should truncate".

Comment thread src/_pytest/_io/pprint.py

def pformat_lines(
self,
object: Any,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest enforcing max_lines and max_chars are kw-only. Otherwise they can be easily confused.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely.

Comment thread src/_pytest/_io/pprint.py
self._format_items(object, stream, indent, allowance, context, level)
stream.write(endchar)
try:
object = sorted(object)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unrelated optimization, or is it somehow related to the iterator change?

If it's an optimization, a comment would be helpful, something like: "Try direct sort first, faster than the fallback.".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sorry I didn't realize this was in this commit. It's indeed an optimization (rather big one imho). Because it uses C sort directly it's a lot faster, and also I suppose that we rarely have heterogenous data structure. (Then it's a little longer of course). Here's a micro-benchmark:

Details
"""Interleaved benchmark of the set-sort fast path in ``_pprint_set``.

Compares ``sorted(s)`` against ``sorted(s, key=_safe_key)`` (the prior
behaviour), and the heterogeneous fallback ``try sorted / except retry``
against always using ``_safe_key``.

Interleaving: A and B are timed back-to-back every iteration, so slow
drift (CPU frequency scaling, thermal, GC) hits both equally and cancels
in the per-iteration difference. We report the median per-call time of
each and the median of the *paired* differences (B - A), which is the
robust estimate of the real gap.
"""

import statistics
import time

from _pytest._io.pprint import _safe_key


def bench_pair(a, b, inner, outer):
    """Interleave-time ``a`` and ``b``; return per-call medians + paired diff."""
    ta_samples, tb_samples, diffs = [], [], []
    for _ in range(outer):
        t = time.perf_counter()
        for _ in range(inner):
            a()
        ta = (time.perf_counter() - t) / inner
        t = time.perf_counter()
        for _ in range(inner):
            b()
        tb = (time.perf_counter() - t) / inner
        ta_samples.append(ta)
        tb_samples.append(tb)
        diffs.append(tb - ta)
    return (
        statistics.median(ta_samples) * 1000,  # ms
        statistics.median(tb_samples) * 1000,  # ms
        statistics.median(diffs) * 1e6,  # us, paired
    )


print("plain sorted (A) vs sorted(key=_safe_key) (B), homogeneous:")
for label, data, inner, outer in [
    ("int set 1k", set(range(1000)), 100, 300),
    ("int set 100k", set(range(100_000)), 3, 80),
    ("str set 100k", {f"item-{i}" for i in range(100_000)}, 1, 80),
]:
    a = lambda d=data: sorted(d)
    b = lambda d=data: sorted(d, key=_safe_key)
    ma, mb, diff = bench_pair(a, b, inner, outer)
    print(
        f"  {label:13} A={ma:9.4f} ms  B={mb:9.4f} ms  B/A={mb / ma:5.1f}x"
        f"  paired B-A={diff:+10.1f} us"
    )

print("\nheterogeneous (unorderable mix), fallback path:")
het = {1, "a", 2, "b", 3.5, None} | {f"x{i}" for i in range(500)} | set(range(500))
try:
    sorted(het)
    raise SystemExit("expected TypeError - set is not heterogeneous")
except TypeError:
    pass


def safe_sort(d=het):
    return sorted(d, key=_safe_key)


def failed_sort(d=het):
    # the *only* extra work the try/except adds: a plain sort that raises
    try:
        sorted(d)
    except TypeError:
        pass


# Measure the overhead directly: the failed sort is the whole cost of the
# fallback's try/except. Comparing new(=failed+safe) vs old(=safe) is
# useless here — the ~us signal is far below the jitter of the ~ms safe
# sort, so the paired diff is noise (can even come out negative).
overhead_ms, safe_ms, _ = bench_pair(failed_sort, safe_sort, 20, 300)
print(
    f"  {'hetero 1k':13} _safe_key sort={safe_ms:8.4f} ms"
    f"  try/except overhead (failed sort)={overhead_ms * 1000:6.1f} us"
    f"  = {overhead_ms / safe_ms * 100:.2f}% of the sort"
)

Result on my machine:

plain sorted (A) vs sorted(key=_safe_key) (B), homogeneous:
  int set 1k    A=   0.0153 ms  B=   0.4661 ms  B/A= 30.4x  paired B-A=    +451.4 us
  int set 100k  A=   1.8300 ms  B=  63.6966 ms  B/A= 34.8x  paired B-A=  +61787.4 us
  str set 100k  A=  54.3415 ms  B= 356.0647 ms  B/A=  6.6x  paired B-A= +302171.5 us

heterogeneous (unorderable mix), fallback path:
  hetero 1k     _safe_key sort=  4.1314 ms  try/except overhead (failed sort)=  12.2 us  = 0.30% of the sort

Comment thread testing/io/test_pprint.py Outdated
pp = PrettyPrinter()
assert pp.pformat({3, 1, 2}) == "{\n 1,\n 2,\n 3,\n}"
# Mixed unorderable types must not raise.
pp.pformat({1, "a", 2, "b"})

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well assert it as well.

Pierre-Sassoulas and others added 3 commits June 14, 2026 11:00
…apped

``_format`` and the per-type helpers now ``yield`` their output as a
stream of string chunks instead of writing to a file-like object, and
``pformat`` joins them. On top of that, ``pformat_lines`` pulls from the
formatter only until a budget is reached:

    pformat_lines(obj, max_lines=None, max_chars=None)

It stops on the first chunk that reaches *either* budget, so a huge
collection costs O(budget) rather than O(N). Either dimension may be
``None`` (unbounded); with both ``None`` the whole object is formatted.

Motivation
----------
Assertion diffs are truncated to a handful of lines/chars before being
shown. Formatting the whole of a large ``==`` comparison and then
throwing almost all of it away is pure waste. With a lazy formatter the
truncating caller simply stops pulling once it has enough.

Benchmark (``PrettyPrinter`` alone, width 80)::

    list(range(500_000)):
        pformat().splitlines()        ~805 ms
        pformat_lines(max_lines=11)   ~0.027 ms      (~30000x)

    [8 small ints] (common small diff):
        pformat().splitlines()        ~0.0133 ms
        pformat_lines(max_lines=11)   ~0.0185 ms     (+~5 us)

    ["x"*100_000] * 3 (flat, few huge elements):
        pformat_lines(max_chars=640)  stops after ~100_000 chars
                                      (one element) instead of 300_000

Why a lazy generator rather than a fast path + budget stream
------------------------------------------------------------
An earlier approach kept a cheap ``pformat().splitlines()`` fast path
guarded by ``len(obj) <= max_lines`` plus a flatness check, falling back
to a write-intercepting budget-stream class for the rest. Two problems:

* ``len(obj)`` is only a *lower* bound on the line count — one nested
  element (``[{...50 keys...}]``) expands to many lines — so the guard
  needed the flatness scan to stay correct, and even then it bounded
  only *lines*, never *chars*: a flat container of a few enormous
  strings has almost no lines but blows the char budget.
* it was two code paths plus a stream class plus an exception used for
  control flow.

Because the formatter is lazy, "stop pulling at the budget" is the whole
optimisation: correct regardless of how lines/chars are distributed
across elements, bounding both dimensions, with no ``len()`` proxy to
get wrong and no fast/slow branch. The common small-diff case costs only
~5 us more than the unbounded path (it is never the bottleneck — a
failing assertion isn't hot), while large comparisons drop by orders of
magnitude.

``_pprint_set``/``_pprint_dict`` also try a plain ``sorted`` first and
fall back to the ``_safe_key`` wrapper only for unorderable mixes.

This diverges structurally from the upstream cpython ``pprint`` it was
vendored from; the module header notes it is no longer kept in sync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
In ``pformat_lines``'s budget loop, ``chunk.count("\n")`` ran on every
chunk, but most chunks (brackets, indentation, item reprs) contain no
newline. Guarding the call with ``"\n" in chunk`` skips it on those and
recovers part of the per-chunk budget-tracking overhead: formatting an
8-element list under a budget drops from ~0.0185 ms to ~0.0163 ms
(versus ~0.0132 ms for an uncapped ``pformat().splitlines()``, so the
budget overhead roughly halves, from ~+5 us to ~+3 us).

The win is small and only matters on the ``-v`` truncating path of a
failing assertion (the default path doesn't format the diff at all), so
this is kept as a separate commit — easy to drop if the extra branch
isn't judged worth it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Addresses review on pytest-dev#14588:

* make ``max_lines`` / ``max_chars`` keyword-only so they can't be
  confused at the call site.
* drop the implementation detail (``_format``) and the "what the caller
  does" note from the docstring; describe the behaviour instead.
* comment the set-sort fast path ("try a direct sort first, faster than
  the fallback").
* assert the heterogeneous-set output in the test rather than only
  checking it does not raise.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip news used on prs to opt out of the changelog requirement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants