gh-139353: Add Objects/unicode_writer.c file #139911

vstinner · 2025-10-10T15:36:37Z

Move the public PyUnicodeWriter API and the private _PyUnicodeWriter API to a new Objects/unicode_writer.c file.

Rename a few helper functions to share them between unicodeobject.c and unicode_writer.c, such as resize_compact() or unicode_result().

Issue: Split large Objects/unicodeobject.c file into smaller files #139353

Move the public PyUnicodeWriter API and the private _PyUnicodeWriter API to a new Objects/unicode_writer.c file. Rename a few helper functions to share them between unicodeobject.c and unicode_writer.c, such as resize_compact() or unicode_result().

vstinner · 2025-10-14T12:59:21Z

cc @serhiy-storchaka @malemburg

malemburg · 2025-10-14T19:17:17Z

As mentioned before, I don't think turning these parts into separate object files is a good idea.

Have you benchmarked the effect of putting the writer into a separate object file vs. keeping it in the unicodeobject.c file (vie #includes )

vstinner · 2025-10-14T19:34:42Z

Have you benchmarked the effect of putting the writer into a separate object file vs. keeping it in the unicodeobject.c file (vie #includes )

I'm not sure if it's relevant since PyUnicodeWriter is not used by unicodeobject.c.

I ran a benchmark on repr(tuple) (which is implemented with PyUnicodeWriter): there is no impact on performance. The difference is just noise in the benchmark.

ref is the main branch and split is this PR:

Benchmark	ref	split
repr tuple-1	254 ns	265 ns: 1.04x slower
repr tuple-10	786 ns	783 ns: 1.00x faster
repr tuple-50	3.02 us	3.03 us: 1.00x slower
repr tuple-100	5.78 us	5.84 us: 1.01x slower
Geometric mean	(ref)	1.01x slower

Benchmark hidden because not significant (1): repr tuple-5

Note: I didn't use LTO+PGO optimizations which should reduce the noise even more.

import pyperf
runner = pyperf.Runner()
for size in (1, 5, 10, 50, 100):
    runner.timeit(f'repr tuple-{size}',
        setup=f't = (1,)*{size}',
        stmt='repr(t)')

malemburg · 2025-10-14T19:44:33Z

Thanks for running the benchmark.

I'm more concerned about uses of unicodeobject.c code in this new unicode_writerr.c, than use of the writer APIs in unicodeobject.c.

vstinner · 2025-10-18T20:13:29Z

I'm more concerned about uses of unicodeobject.c code in this new unicode_writerr.c, than use of the writer APIs in unicodeobject.c.

Oh ok. Well, it has been measured by the benchmark as well.

vstinner · 2025-10-30T10:03:58Z

@serhiy-storchaka @malemburg: So are you against merging this PR, or can I merge it?

malemburg · 2025-10-30T10:09:42Z

Fine for me.

I would have thought that the benchmarks would show a degradation in performance due to the compiler not being able to inline often used functions, but since that's apparently not the case, I don't have objections.

Please do run such benchmarks for future splits out of unicodeobject.c as well, to be on the safe side.

Thanks.

serhiy-storchaka

LGTM. 👍

This is only 600 lines, to moving them out will not have large benefit. But this can be good for source code structuring purpose.

vstinner · 2025-10-30T13:36:37Z

Merged, thanks for reviews.

vstinner requested review from a team, AA-Turner, emmatyping, ericsnowcurrently and erlend-aasland as code owners October 10, 2025 15:36

bedevere-app bot mentioned this pull request Oct 10, 2025

Split large Objects/unicodeobject.c file into smaller files #139353

Open

bedevere-app bot added the awaiting core review label Oct 10, 2025

vstinner force-pushed the unicode_writer branch from a263030 to c2f600e Compare October 10, 2025 15:37

StanFromIreland added the skip news label Oct 10, 2025

serhiy-storchaka approved these changes Oct 30, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Oct 30, 2025

vstinner merged commit efc37ba into python:main Oct 30, 2025
49 checks passed

vstinner deleted the unicode_writer branch October 30, 2025 13:36

bedevere-app bot removed the awaiting merge label Oct 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-139353: Add Objects/unicode_writer.c file #139911

gh-139353: Add Objects/unicode_writer.c file #139911

Uh oh!

vstinner commented Oct 10, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

vstinner commented Oct 14, 2025

Uh oh!

malemburg commented Oct 14, 2025

Uh oh!

vstinner commented Oct 14, 2025

Uh oh!

malemburg commented Oct 14, 2025 •

edited

Loading

Uh oh!

vstinner commented Oct 18, 2025

Uh oh!

vstinner commented Oct 30, 2025

Uh oh!

malemburg commented Oct 30, 2025

Uh oh!

serhiy-storchaka left a comment

Uh oh!

Uh oh!

vstinner commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

gh-139353: Add Objects/unicode_writer.c file #139911

gh-139353: Add Objects/unicode_writer.c file #139911

Uh oh!

Conversation

vstinner commented Oct 10, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Oct 14, 2025

Uh oh!

malemburg commented Oct 14, 2025

Uh oh!

vstinner commented Oct 14, 2025

Uh oh!

malemburg commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Oct 18, 2025

Uh oh!

vstinner commented Oct 30, 2025

Uh oh!

malemburg commented Oct 30, 2025

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vstinner commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vstinner commented Oct 10, 2025 •

edited by bedevere-app bot

Loading

malemburg commented Oct 14, 2025 •

edited

Loading