Skip to content

Commit da52c06

Browse files
vstinnerAA-Turnerpicnixz
authored
PEP 782: Add PyBytesWriter C API (#4325)
Co-authored-by: Adam Turner <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]>
1 parent ea629e8 commit da52c06

File tree

2 files changed

+373
-0
lines changed

2 files changed

+373
-0
lines changed

.github/CODEOWNERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -660,6 +660,7 @@ peps/pep-0777.rst @warsaw
660660
peps/pep-0779.rst @Yhg1s @colesbury @mpage
661661
peps/pep-0780.rst @lysnikolaou
662662
peps/pep-0781.rst @methane
663+
peps/pep-0782.rst @vstinner
663664
# ...
664665
peps/pep-0789.rst @njsmith
665666
# ...

peps/pep-0782.rst

Lines changed: 372 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,372 @@
1+
PEP: 782
2+
Title: Add PyBytesWriter C API
3+
Author: Victor Stinner <[email protected]>
4+
Status: Draft
5+
Type: Standards Track
6+
Created: 27-Mar-2025
7+
Python-Version: 3.14
8+
Post-History:
9+
`18-Feb-2025 <https://discuss.python.org/t/81182>`__
10+
11+
12+
.. highlight:: c
13+
14+
15+
Abstract
16+
========
17+
18+
Add a new ``PyBytesWriter`` C API to create ``bytes`` objects.
19+
20+
Soft deprecate ``PyBytes_FromStringAndSize(NULL, size)`` and
21+
``_PyBytes_Resize()`` APIs. These APIs treat an immutable ``bytes``
22+
object as a mutable object. They remain available and maintained, don't
23+
emit deprecation warning, but are no longer recommended when writing new
24+
code.
25+
26+
27+
Rationale
28+
=========
29+
30+
Disallow creation of incomplete/inconsistent objects
31+
----------------------------------------------------
32+
33+
Creating a Python :class:`bytes` object using
34+
``PyBytes_FromStringAndSize(NULL, size)`` and ``_PyBytes_Resize()``
35+
treats an immutable :class:`bytes` object as mutable. It goes against
36+
the principle that :class:`bytes` objects are immutable. It also creates
37+
an incomplete or "invalid" object since bytes are not initialized. In
38+
Python, a :class:`bytes` object should always have its bytes fully
39+
initialized.
40+
41+
* `Avoid creating incomplete/invalid objects api-evolution#36
42+
<https://github.com/capi-workgroup/api-evolution/issues/36>`_
43+
* `Disallow mutating immutable objects api-evolution#20
44+
<https://github.com/capi-workgroup/api-evolution/issues/20>`_
45+
* `Disallow creation of incomplete/inconsistent objects problems#56
46+
<https://github.com/capi-workgroup/problems/issues/56>`_
47+
48+
Inefficient allocation strategy
49+
-------------------------------
50+
51+
When creating a bytes string and the output size is unknown, one
52+
strategy is to allocate a short buffer and extend it (to the exact size)
53+
each time a larger write is needed.
54+
55+
This strategy is inefficient because it requires to enlarge the buffer
56+
multiple timess. It's more efficient to overallocate the buffer the
57+
first time that a larger write is needed. It reduces the number of
58+
expensive ``realloc()`` operations which can imply a memory copy.
59+
60+
61+
Specification
62+
=============
63+
64+
API
65+
---
66+
67+
.. c:type:: PyBytesWriter
68+
69+
A Python :class:`bytes` writer instance created by
70+
:c:func:`PyBytesWriter_Create`.
71+
72+
The instance must be destroyed by :c:func:`PyBytesWriter_Finish` or
73+
:c:func:`PyBytesWriter_Discard`.
74+
75+
Create, Finish, Discard
76+
^^^^^^^^^^^^^^^^^^^^^^^
77+
78+
.. c:function:: PyBytesWriter* PyBytesWriter_Create(Py_ssize_t size)
79+
80+
Create a :c:type:`PyBytesWriter` to write *size* bytes.
81+
82+
If *size* is greater than zero, allocate *size* bytes for the
83+
returned buffer.
84+
85+
On error, set an exception and return NULL.
86+
87+
*size* must be positive or zero.
88+
89+
.. c:function:: PyObject* PyBytesWriter_Finish(PyBytesWriter *writer)
90+
91+
Finish a :c:type:`PyBytesWriter` created by
92+
:c:func:`PyBytesWriter_Create`.
93+
94+
On success, return a Python :class:`bytes` object.
95+
On error, set an exception and return ``NULL``.
96+
97+
The writer instance is invalid after the call in any case.
98+
99+
.. c:function:: PyObject* PyBytesWriter_FinishWithSize(PyBytesWriter *writer, Py_ssize_t size)
100+
101+
Similar to :c:func:`PyBytesWriter_Finish`, but resize the writer
102+
to *size* bytes before creating the :class:`bytes` object.
103+
104+
.. c:function:: PyObject* PyBytesWriter_FinishWithPointer(PyBytesWriter *writer, void *buf)
105+
106+
Similar to :c:func:`PyBytesWriter_Finish`, but resize the writer
107+
using *buf* pointer before creating the :class:`bytes` object.
108+
109+
Pseudo-code::
110+
111+
Py_ssize_t size = (char*)buf - (char*)PyBytesWriter_GetData(writer);
112+
return PyBytesWriter_FinishWithSize(writer, size);
113+
114+
Set an exception and return ``NULL`` if *buf* pointer is outside the
115+
internal buffer bounds.
116+
117+
.. c:function:: void PyBytesWriter_Discard(PyBytesWriter *writer)
118+
119+
Discard a :c:type:`PyBytesWriter` created by :c:func:`PyBytesWriter_Create`.
120+
121+
Do nothing if *writer* is ``NULL``.
122+
123+
The writer instance is invalid after the call.
124+
125+
High-level API
126+
^^^^^^^^^^^^^^
127+
128+
.. c:function:: int PyBytesWriter_WriteBytes(PyBytesWriter *writer, const void *bytes, Py_ssize_t size)
129+
130+
Write *size* bytes of *bytes* into the *writer*.
131+
132+
If *size* is equal to ``-1``, call ``strlen(bytes)`` to get the
133+
string length.
134+
135+
On success, return ``0``.
136+
On error, set an exception and return ``-1``.
137+
138+
.. c:function:: int PyBytesWriter_Format(PyBytesWriter *writer, const char *format, ...)
139+
140+
Similar to ``PyBytes_FromFormat()``, but write the output directly
141+
into the writer.
142+
143+
On success, return ``0``.
144+
On error, set an exception and return ``-1``.
145+
146+
Getters
147+
^^^^^^^
148+
149+
.. c:function:: Py_ssize_t PyBytesWriter_GetSize(PyBytesWriter *writer)
150+
151+
Get the writer size.
152+
153+
.. c:function:: void* PyBytesWriter_GetData(PyBytesWriter *writer)
154+
155+
Get the writer data.
156+
157+
The pointer is valid until :c:func:`PyBytesWriter_Finish` or
158+
:c:func:`PyBytesWriter_Discard` is called on *writer*.
159+
160+
Low-level API
161+
^^^^^^^^^^^^^
162+
163+
.. c:function:: int PyBytesWriter_Resize(PyBytesWriter *writer, Py_ssize_t size)
164+
165+
Resize the writer to *size* bytes. It can be used to enlarge or to
166+
shrink the writer.
167+
168+
Newly allocated bytes are left uninitialized.
169+
170+
On success, return ``0``.
171+
On error, set an exception and return ``-1``.
172+
173+
*size* must be positive or zero.
174+
175+
.. c:function:: int PyBytesWriter_Grow(PyBytesWriter *writer, Py_ssize_t grow)
176+
177+
Resize the writer by adding *grow* bytes to the current writer size.
178+
179+
Newly allocated bytes are left uninitialized.
180+
181+
On success, return ``0``.
182+
On error, set an exception and return ``-1``.
183+
184+
*size* must be positive or zero.
185+
186+
.. c:function:: void* PyBytesWriter_GrowAndUpdatePointer(PyBytesWriter *writer, Py_ssize_t size, void *buf)
187+
188+
Similar to :c:func:`PyBytesWriter_Grow`, but update also the *buf*
189+
pointer.
190+
191+
On error, set an exception and return ``NULL``.
192+
193+
Pseudo-code::
194+
195+
Py_ssize_t pos = (char*)buf - (char*)PyBytesWriter_GetData(writer);
196+
if (PyBytesWriter_Grow(writer, size) < 0) {
197+
return NULL;
198+
}
199+
return (char*)PyBytesWriter_GetData(writer) + pos;
200+
201+
202+
Overallocation
203+
--------------
204+
205+
:c:func:`PyBytesWriter_Resize` and :c:func:`PyBytesWriter_Grow`
206+
overallocate the internal buffer to reduce the number of ``realloc()``
207+
calls and so reduce memory copies.
208+
209+
210+
Thread safety
211+
-------------
212+
213+
The API is not thread safe: a writer should only be used by a single
214+
thread at the same time.
215+
216+
217+
Soft deprecations
218+
-----------------
219+
220+
Soft deprecate ``PyBytes_FromStringAndSize(NULL, size)`` and
221+
``_PyBytes_Resize()`` APIs. These APIs treat an immutable ``bytes``
222+
object as a mutable object. They remain available and maintained, don't
223+
emit deprecation warning, but are no longer recommended when writing new
224+
code.
225+
226+
``PyBytes_FromStringAndSize(str, size)`` is not soft deprecated. Only
227+
calls with ``NULL`` *str* are soft deprecated.
228+
229+
230+
Examples
231+
========
232+
233+
High-level API
234+
--------------
235+
236+
Create the bytes string ``b"Hello World!"``::
237+
238+
PyObject* hello_world(void)
239+
{
240+
PyBytesWriter *writer = PyBytesWriter_Create(0);
241+
if (writer == NULL) {
242+
goto error;
243+
}
244+
if (PyBytesWriter_WriteBytes(writer, "Hello", -1) < 0) {
245+
goto error;
246+
}
247+
if (PyBytesWriter_Format(writer, " %s!", "World") < 0) {
248+
goto error;
249+
}
250+
return PyBytesWriter_Finish(writer);
251+
252+
error:
253+
PyBytesWriter_Discard(writer);
254+
return NULL;
255+
}
256+
257+
258+
Create the bytes string "abc"
259+
-----------------------------
260+
261+
Example creating the bytes string ``b"abc"``, with a fixed size of 3 bytes::
262+
263+
PyObject* create_abc(void)
264+
{
265+
PyBytesWriter *writer = PyBytesWriter_Create(3);
266+
if (writer == NULL) {
267+
return NULL;
268+
}
269+
270+
char *str = PyBytesWriter_GetData(writer);
271+
memcpy(str, "abc", 3);
272+
return PyBytesWriter_Finish(writer);
273+
}
274+
275+
GrowAndUpdatePointer() example
276+
------------------------------
277+
278+
Example using a pointer to write bytes and to track the written size.
279+
280+
Create the bytes string ``b"Hello World"``::
281+
282+
PyObject* grow_example(void)
283+
{
284+
// Allocate 10 bytes
285+
PyBytesWriter *writer = PyBytesWriter_Create(10);
286+
if (writer == NULL) {
287+
return NULL;
288+
}
289+
290+
// Write some bytes
291+
char *buf = PyBytesWriter_GetData(writer);
292+
memcpy(buf, "Hello ", strlen("Hello "));
293+
buf += strlen("Hello ");
294+
295+
// Allocate 10 more bytes
296+
buf = PyBytesWriter_GrowAndUpdatePointer(writer, 10, buf);
297+
if (buf == NULL) {
298+
PyBytesWriter_Discard(writer);
299+
return NULL;
300+
}
301+
302+
// Write more bytes
303+
memcpy(buf, "World", strlen("World"));
304+
buf += strlen("World");
305+
306+
// Truncate the string at 'buf' position
307+
// and create a bytes object
308+
return PyBytesWriter_FinishWithPointer(writer, buf);
309+
}
310+
311+
312+
Reference Implementation
313+
========================
314+
315+
`Pull request gh-131681 <https://github.com/python/cpython/pull/131681>`__.
316+
317+
The implementation allocates internally a :class:`bytes` object, so
318+
:c:func:`PyBytesWriter_Finish` just returns the object without having
319+
to copy memory.
320+
321+
For strings up to 256 bytes, a small internal raw buffer of bytes is
322+
used. It avoids having to resize a :class:`bytes` object which is
323+
inefficient. At the end, :c:func:`PyBytesWriter_Finish` creates the
324+
:class:`bytes` object from this small buffer.
325+
326+
A free list is used to reduce the cost of allocating a
327+
:c:type:`PyBytesWriter` on the heap memory.
328+
329+
330+
Backwards Compatibility
331+
=======================
332+
333+
There is no impact on the backward compatibility, only new APIs are
334+
added.
335+
336+
337+
Prior Discussions
338+
=================
339+
340+
* March 2025: Third public API attempt, using size rather than pointers:
341+
342+
* `Discussion <https://discuss.python.org/t/81182/56>`_
343+
* `Pull request gh-131681 <https://github.com/python/cpython/pull/131681>`__
344+
345+
* February 2025: Second public API attempt:
346+
347+
* `Issue gh-129813 <https://github.com/python/cpython/issues/129813>`_
348+
and
349+
`pull request gh-129814
350+
<https://github.com/python/cpython/pull/129814>`_
351+
352+
* July 2024: First public API attempt:
353+
354+
* C API Working Group decision:
355+
`Add PyBytes_Writer() API
356+
<https://github.com/capi-workgroup/decisions/issues/39>`_
357+
(August 2024)
358+
* `Pull request gh-121726
359+
<https://github.com/python/cpython/pull/121726>`_:
360+
first public API attempt (July 2024)
361+
362+
* March 2016:
363+
`Fast _PyAccu, _PyUnicodeWriter and _PyBytesWriter APIs to produce
364+
strings in CPython <https://vstinner.github.io/pybyteswriter.html>`_:
365+
Article on the original private ``_PyBytesWriter`` C API.
366+
367+
368+
Copyright
369+
=========
370+
371+
This document is placed in the public domain or under the
372+
CC0-1.0-Universal license, whichever is more permissive.

0 commit comments

Comments
 (0)