Skip to content

Conversation

@vstinner
Copy link
Member

@vstinner vstinner commented Oct 10, 2025

Move the public PyUnicodeWriter API and the private _PyUnicodeWriter API to a new Objects/unicode_writer.c file.

Rename a few helper functions to share them between unicodeobject.c and unicode_writer.c, such as resize_compact() or unicode_result().

Move the public PyUnicodeWriter API and the private _PyUnicodeWriter
API to a new Objects/unicode_writer.c file.

Rename a few helper functions to share them between unicodeobject.c
and unicode_writer.c, such as resize_compact() or unicode_result().
@vstinner
Copy link
Member Author

cc @serhiy-storchaka @malemburg

@malemburg
Copy link
Member

As mentioned before, I don't think turning these parts into separate object files is a good idea.

Have you benchmarked the effect of putting the writer into a separate object file vs. keeping it in the unicodeobject.c file (vie #includes )

@vstinner
Copy link
Member Author

Have you benchmarked the effect of putting the writer into a separate object file vs. keeping it in the unicodeobject.c file (vie #includes )

I'm not sure if it's relevant since PyUnicodeWriter is not used by unicodeobject.c.

I ran a benchmark on repr(tuple) (which is implemented with PyUnicodeWriter): there is no impact on performance. The difference is just noise in the benchmark.

ref is the main branch and split is this PR:

Benchmark ref split
repr tuple-1 254 ns 265 ns: 1.04x slower
repr tuple-10 786 ns 783 ns: 1.00x faster
repr tuple-50 3.02 us 3.03 us: 1.00x slower
repr tuple-100 5.78 us 5.84 us: 1.01x slower
Geometric mean (ref) 1.01x slower

Benchmark hidden because not significant (1): repr tuple-5

Note: I didn't use LTO+PGO optimizations which should reduce the noise even more.

import pyperf
runner = pyperf.Runner()
for size in (1, 5, 10, 50, 100):
    runner.timeit(f'repr tuple-{size}',
        setup=f't = (1,)*{size}',
        stmt='repr(t)')

@malemburg
Copy link
Member

malemburg commented Oct 14, 2025

Thanks for running the benchmark.

I'm more concerned about uses of unicodeobject.c code in this new unicode_writerr.c, than use of the writer APIs in unicodeobject.c.

@vstinner
Copy link
Member Author

I'm more concerned about uses of unicodeobject.c code in this new unicode_writerr.c, than use of the writer APIs in unicodeobject.c.

Oh ok. Well, it has been measured by the benchmark as well.

@vstinner
Copy link
Member Author

@serhiy-storchaka @malemburg: So are you against merging this PR, or can I merge it?

@malemburg
Copy link
Member

Fine for me.

I would have thought that the benchmarks would show a degradation in performance due to the compiler not being able to inline often used functions, but since that's apparently not the case, I don't have objections.

Please do run such benchmarks for future splits out of unicodeobject.c as well, to be on the safe side.

Thanks.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 👍

This is only 600 lines, to moving them out will not have large benefit. But this can be good for source code structuring purpose.

@vstinner vstinner merged commit efc37ba into python:main Oct 30, 2025
49 checks passed
@vstinner vstinner deleted the unicode_writer branch October 30, 2025 13:36
@vstinner
Copy link
Member Author

Merged, thanks for reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants