Skip to content

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Sep 13, 2025

Replace the private _PyBytesWriter API with the new public PyBytesWriter API in utf8_encoder() and unicode_encode_ucs1().

Replace the private _PyBytesWriter API with the new public
PyBytesWriter API in utf8_encoder() and unicode_encode_ucs1().
@vstinner
Copy link
Member Author

vstinner commented Sep 13, 2025

Microbenchmark on the UTF-8 encoder:

import pyperf
runner = pyperf.Runner()
runner.timeit('abc',
    setup='s="abc"',
    stmt='s.encode()')
runner.timeit('a x 1000',
    setup='s="a" * 1000',
    stmt='s.encode()')
runner.timeit('ab<surrogate> [namereplace]',
    setup=r's="ab\udc80"',
    stmt='s.encode(errors="namereplace")')
runner.timeit('ab<surrogate> [ignore]',
    setup=r's="ab\udc80"',
    stmt='s.encode(errors="ignore")')
runner.timeit('(a<surrogate>) x 1000 [namereplace]',
    setup=r's="a\udc80" * 1000',
    stmt='s.encode(errors="namereplace")')
runner.timeit('(a<surrogate>) x 1000 [ignore]',
    setup=r's="a\udc80" * 1000',
    stmt='s.encode(errors="ignore")')

Results:

Benchmark bench1_ref bench1_pep782
abc 34.5 ns 35.8 ns: 1.04x slower
a x 1000 98.5 ns 103 ns: 1.05x slower
ab [namereplace] 643 ns 694 ns: 1.08x slower
ab [ignore] 108 ns 113 ns: 1.05x slower
(a) x 1000 [namereplace] 225 us 223 us: 1.01x faster
(a) x 1000 [ignore] 3.72 us 3.57 us: 1.04x faster
Geometric mean (ref) 1.03x slower

@vstinner
Copy link
Member Author

cc @serhiy-storchaka

@serhiy-storchaka
Copy link
Member

Please make benchmarks for non-ASCII strings. Consider different ranges (which represents different internal representations and lenghts of UTF-8 representation):

  • 0-0x7F
  • 0x80-0xFF
  • 0x100-0x3FF
  • 0x400-0xFFFF
  • 0x10000-0x10FFFF

Consider also strings which contain one character from the higher range (for example, 0x10000 and all other characters ASCII, etc).

@vstinner vstinner marked this pull request as ready for review September 15, 2025 10:50
@vstinner
Copy link
Member Author

vstinner commented Sep 15, 2025

More benchmark:

import pyperf
runner = pyperf.Runner()
ranges = (
    (r'\0',
     r'\x7f'),
    (r'\x80',
     r'\xff'),
    (r'\u0400',
     r'\u0fff'),
    (r'\U00010000',
     r'\u0010ffff'),
)
for first, last in ranges:
    runner.timeit(f'"{first}{last}"',
        setup=f"first='{first}'; last='{last}'; s=first+last",
        stmt='s.encode()')
for length in (5, 50, 500):
    for first, last in ranges:
        runner.timeit(f'"{first}{last}" * {length}',
            setup=f"first='{first}'; last='{last}'; s=(first+last) * {length}",
            stmt='s.encode()')

Results:

Benchmark bench2_ref bench2_pep782
"\0\x7f" 34.9 ns 33.6 ns: 1.04x faster
"\x80\xff" 46.5 ns 43.2 ns: 1.08x faster
"\u0400\u0fff" 51.1 ns 53.9 ns: 1.05x slower
"\0\x7f" * 5 38.0 ns 39.0 ns: 1.03x slower
"\x80\xff" * 5 51.2 ns 53.8 ns: 1.05x slower
"\U00010000\u0010ffff" * 5 74.6 ns 75.7 ns: 1.02x slower
"\0\x7f" * 50 39.6 ns 39.9 ns: 1.01x slower
"\u0400\u0fff" * 50 191 ns 193 ns: 1.01x slower
"\U00010000\u0010ffff" * 50 386 ns 392 ns: 1.02x slower
"\x80\xff" * 500 959 ns 982 ns: 1.02x slower
"\u0400\u0fff" * 500 1.43 us 1.48 us: 1.03x slower
Geometric mean (ref) 1.01x slower

Benchmark hidden because not significant (5): "\U00010000\u0010ffff", "\u0400\u0fff" * 5, "\x80\xff" * 50, "\0\x7f" * 500, "\U00010000\u0010ffff" * 500

@vstinner
Copy link
Member Author

vstinner commented Sep 15, 2025

Benchmark:

import pyperf
runner = pyperf.Runner()
for length in (5, 50, 500):
    runner.timeit(f'"x" * {length} + chr(0x10000)',
        setup=f's="x" * {length} + chr(0x10000)',
        stmt='s.encode()')

Results:

Benchmark bench3_ref bench3_pep782
"x" * 5 + chr(0x10000) 51.6 ns 49.5 ns: 1.04x faster
"x" * 50 + chr(0x10000) 78.9 ns 77.4 ns: 1.02x faster
Geometric mean (ref) 1.02x faster

Benchmark hidden because not significant (1): "x" * 500 + chr(0x10000)

@vstinner
Copy link
Member Author

@serhiy-storchaka: The difference is about a "few nanoseconds", around +10 ns in the worst case, 1.08x faster in the best case. Do you think that it's acceptable?

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some overhead is caused by dynamic memory allocation for PyBytesWriter -- it is unavoidable. But there may be a loss due to losing fine control on overallocation -- I need to look at it closer. Maybe it can be avoided by adding a new C API.

@vstinner
Copy link
Member Author

Some overhead is caused by dynamic memory allocation for PyBytesWriter -- it is unavoidable.

PyBytesWriter_Create() uses a free list to avoid PyMem_Malloc() cost in the common case.

But there may be a loss due to losing fine control on overallocation -- I need to look at it closer. Maybe it can be avoided by adding a new C API.

If this loss can be measured, I would suggest adding a private API to disable overallocation.

@vstinner
Copy link
Member Author

vstinner commented Sep 18, 2025

Updated benchmark on the worst case:

import pyperf
runner=pyperf.Runner()
runner.timeit('utf8',
    setup=r's="\uFFFF"*(256//3)+"\uDC80"',
    stmt='s.encode(errors="backslashreplace")')
runner.timeit('latin1',
    setup=r"s=('a'*255+'\u0100')",
    stmt="s.encode('latin1', 'backslashreplace')")

Result:

Benchmark bench4_ref bench4_pep782
utf8 256 ns 263 ns: 1.03x slower
latin1 278 ns 291 ns: 1.05x slower
Geometric mean (ref) 1.04x slower

@vstinner
Copy link
Member Author

@serhiy-storchaka: utf8_encoder() and unicode_encode_ucs1() are the last 2 functions using the private API. I would like to merge this change to be able to remove the private API, even if there is an overhead on performance. On the common cases, there is no significant impact on performance.

@vstinner
Copy link
Member Author

I updated the PR to keep the overallocate=0 optimization.

@vstinner
Copy link
Member Author

vstinner commented Sep 18, 2025

More benchmarks on the corner cases.

Benchmark: python -m pyperf timeit -s "s=('a'*100+'\u0100'*100)" "s.encode('latin1', 'backslashreplace')"

Result: Mean +- std dev: [timeit1_ref] 503 ns +- 37 ns -> [timeit1_pep782] 483 ns +- 25 ns: 1.04x faster


Benchmark: python -m pyperf timeit -s "s=(('a'*10+'\u0100')*10)" "s.encode('latin1', 'backslashreplace')"

Result: Mean +- std dev: [timeit2_ref] 243 ns +- 13 ns -> [timeit2_pep782] 248 ns +- 4 ns: 1.02x slower

@vstinner
Copy link
Member Author

I updated the PR to reimplement the min_size micro-optimization. There is no more 1.3x slowdown.

I also recomputed all benchmark results on the latest PR version. Results are now between 1.08x slower and 1.08x faster. Most benchmarks are in the [-5%, +5%] range which can be associated to noise in the benchmark (can be ignored).

@serhiy-storchaka: I plan to merge this change next week.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 👍

Remove useless PyBytesWriter_Discard() call
@vstinner vstinner merged commit 8cfd7b4 into python:main Sep 23, 2025
43 checks passed
@vstinner vstinner deleted the pybyteswriter_encode_utf8 branch September 23, 2025 09:47
@vstinner
Copy link
Member Author

Merged, thanks for the review @serhiy-storchaka.

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 Ubuntu Shared 3.x (tier-1) has failed when building commit 8cfd7b4.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/506/builds/11478) and take a look at the build logs.
  4. Check if the failure is related to this commit (8cfd7b4) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/506/builds/11478

Failed tests:

  • test_interpreters

Failed subtests:

  • test_keyboard_interrupt_in_thread_running_interp - test.test_interpreters.test_api.InterpreterObjectTests.test_keyboard_interrupt_in_thread_running_interp

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_interpreters/test_api.py", line 462, in test_keyboard_interrupt_in_thread_running_interp
    self.assertEqual(retcode, 0)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^
AssertionError: -2 != 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants