|
| 1 | +PEP: 782 |
| 2 | +Title: Add PyBytesWriter C API |
| 3 | +Author: Victor Stinner < [email protected]> |
| 4 | +Status: Draft |
| 5 | +Type: Standards Track |
| 6 | +Created: 27-Mar-2025 |
| 7 | +Python-Version: 3.14 |
| 8 | +Post-History: |
| 9 | + `18-Feb-2025 <https://discuss.python.org/t/81182>`__ |
| 10 | + |
| 11 | + |
| 12 | +.. highlight:: c |
| 13 | + |
| 14 | + |
| 15 | +Abstract |
| 16 | +======== |
| 17 | + |
| 18 | +Add a new ``PyBytesWriter`` C API to create ``bytes`` objects. |
| 19 | + |
| 20 | +Soft deprecate ``PyBytes_FromStringAndSize(NULL, size)`` and |
| 21 | +``_PyBytes_Resize()`` APIs. These APIs treat an immutable ``bytes`` |
| 22 | +object as a mutable object. They remain available and maintained, don't |
| 23 | +emit deprecation warning, but are no longer recommended when writing new |
| 24 | +code. |
| 25 | + |
| 26 | + |
| 27 | +Rationale |
| 28 | +========= |
| 29 | + |
| 30 | +Disallow creation of incomplete/inconsistent objects |
| 31 | +---------------------------------------------------- |
| 32 | + |
| 33 | +Creating a Python :class:`bytes` object using |
| 34 | +``PyBytes_FromStringAndSize(NULL, size)`` and ``_PyBytes_Resize()`` |
| 35 | +treats an immutable :class:`bytes` object as mutable. It goes against |
| 36 | +the principle that :class:`bytes` objects are immutable. It also creates |
| 37 | +an incomplete or "invalid" object since bytes are not initialized. In |
| 38 | +Python, a :class:`bytes` object should always have its bytes fully |
| 39 | +initialized. |
| 40 | + |
| 41 | +* `Avoid creating incomplete/invalid objects api-evolution#36 |
| 42 | + <https://github.com/capi-workgroup/api-evolution/issues/36>`_ |
| 43 | +* `Disallow mutating immutable objects api-evolution#20 |
| 44 | + <https://github.com/capi-workgroup/api-evolution/issues/20>`_ |
| 45 | +* `Disallow creation of incomplete/inconsistent objects problems#56 |
| 46 | + <https://github.com/capi-workgroup/problems/issues/56>`_ |
| 47 | + |
| 48 | +Inefficient allocation strategy |
| 49 | +------------------------------- |
| 50 | + |
| 51 | +When creating a bytes string and the output size is unknown, one |
| 52 | +strategy is to allocate a short buffer and extend it (to the exact size) |
| 53 | +each time a larger write is needed. |
| 54 | + |
| 55 | +This strategy is inefficient because it requires to enlarge the buffer |
| 56 | +multiple timess. It's more efficient to overallocate the buffer the |
| 57 | +first time that a larger write is needed. It reduces the number of |
| 58 | +expensive ``realloc()`` operations which can imply a memory copy. |
| 59 | + |
| 60 | + |
| 61 | +Specification |
| 62 | +============= |
| 63 | + |
| 64 | +API |
| 65 | +--- |
| 66 | + |
| 67 | +.. c:type:: PyBytesWriter |
| 68 | +
|
| 69 | + A Python :class:`bytes` writer instance created by |
| 70 | + :c:func:`PyBytesWriter_Create`. |
| 71 | + |
| 72 | + The instance must be destroyed by :c:func:`PyBytesWriter_Finish` or |
| 73 | + :c:func:`PyBytesWriter_Discard`. |
| 74 | + |
| 75 | +Create, Finish, Discard |
| 76 | +^^^^^^^^^^^^^^^^^^^^^^^ |
| 77 | + |
| 78 | +.. c:function:: PyBytesWriter* PyBytesWriter_Create(Py_ssize_t size) |
| 79 | +
|
| 80 | + Create a :c:type:`PyBytesWriter` to write *size* bytes. |
| 81 | +
|
| 82 | + If *size* is greater than zero, allocate *size* bytes for the |
| 83 | + returned buffer. |
| 84 | +
|
| 85 | + On error, set an exception and return NULL. |
| 86 | +
|
| 87 | + *size* must be positive or zero. |
| 88 | +
|
| 89 | +.. c:function:: PyObject* PyBytesWriter_Finish(PyBytesWriter *writer) |
| 90 | +
|
| 91 | + Finish a :c:type:`PyBytesWriter` created by |
| 92 | + :c:func:`PyBytesWriter_Create`. |
| 93 | +
|
| 94 | + On success, return a Python :class:`bytes` object. |
| 95 | + On error, set an exception and return ``NULL``. |
| 96 | +
|
| 97 | + The writer instance is invalid after the call in any case. |
| 98 | +
|
| 99 | +.. c:function:: PyObject* PyBytesWriter_FinishWithSize(PyBytesWriter *writer, Py_ssize_t size) |
| 100 | +
|
| 101 | + Similar to :c:func:`PyBytesWriter_Finish`, but resize the writer |
| 102 | + to *size* bytes before creating the :class:`bytes` object. |
| 103 | +
|
| 104 | +.. c:function:: PyObject* PyBytesWriter_FinishWithPointer(PyBytesWriter *writer, void *buf) |
| 105 | +
|
| 106 | + Similar to :c:func:`PyBytesWriter_Finish`, but resize the writer |
| 107 | + using *buf* pointer before creating the :class:`bytes` object. |
| 108 | +
|
| 109 | + Pseudo-code:: |
| 110 | +
|
| 111 | + Py_ssize_t size = (char*)buf - (char*)PyBytesWriter_GetData(writer); |
| 112 | + return PyBytesWriter_FinishWithSize(writer, size); |
| 113 | +
|
| 114 | + Set an exception and return ``NULL`` if *buf* pointer is outside the |
| 115 | + internal buffer bounds. |
| 116 | +
|
| 117 | +.. c:function:: void PyBytesWriter_Discard(PyBytesWriter *writer) |
| 118 | +
|
| 119 | + Discard a :c:type:`PyBytesWriter` created by :c:func:`PyBytesWriter_Create`. |
| 120 | +
|
| 121 | + Do nothing if *writer* is ``NULL``. |
| 122 | +
|
| 123 | + The writer instance is invalid after the call. |
| 124 | +
|
| 125 | +High-level API |
| 126 | +^^^^^^^^^^^^^^ |
| 127 | +
|
| 128 | +.. c:function:: int PyBytesWriter_WriteBytes(PyBytesWriter *writer, const void *bytes, Py_ssize_t size) |
| 129 | +
|
| 130 | + Write *size* bytes of *bytes* into the *writer*. |
| 131 | +
|
| 132 | + If *size* is equal to ``-1``, call ``strlen(bytes)`` to get the |
| 133 | + string length. |
| 134 | +
|
| 135 | + On success, return ``0``. |
| 136 | + On error, set an exception and return ``-1``. |
| 137 | +
|
| 138 | +.. c:function:: int PyBytesWriter_Format(PyBytesWriter *writer, const char *format, ...) |
| 139 | +
|
| 140 | + Similar to ``PyBytes_FromFormat()``, but write the output directly |
| 141 | + into the writer. |
| 142 | +
|
| 143 | + On success, return ``0``. |
| 144 | + On error, set an exception and return ``-1``. |
| 145 | +
|
| 146 | +Getters |
| 147 | +^^^^^^^ |
| 148 | +
|
| 149 | +.. c:function:: Py_ssize_t PyBytesWriter_GetSize(PyBytesWriter *writer) |
| 150 | +
|
| 151 | + Get the writer size. |
| 152 | +
|
| 153 | +.. c:function:: void* PyBytesWriter_GetData(PyBytesWriter *writer) |
| 154 | +
|
| 155 | + Get the writer data. |
| 156 | +
|
| 157 | + The pointer is valid until :c:func:`PyBytesWriter_Finish` or |
| 158 | + :c:func:`PyBytesWriter_Discard` is called on *writer*. |
| 159 | +
|
| 160 | +Low-level API |
| 161 | +^^^^^^^^^^^^^ |
| 162 | +
|
| 163 | +.. c:function:: int PyBytesWriter_Resize(PyBytesWriter *writer, Py_ssize_t size) |
| 164 | +
|
| 165 | + Resize the writer to *size* bytes. It can be used to enlarge or to |
| 166 | + shrink the writer. |
| 167 | +
|
| 168 | + Newly allocated bytes are left uninitialized. |
| 169 | +
|
| 170 | + On success, return ``0``. |
| 171 | + On error, set an exception and return ``-1``. |
| 172 | +
|
| 173 | + *size* must be positive or zero. |
| 174 | +
|
| 175 | +.. c:function:: int PyBytesWriter_Grow(PyBytesWriter *writer, Py_ssize_t grow) |
| 176 | +
|
| 177 | + Resize the writer by adding *grow* bytes to the current writer size. |
| 178 | +
|
| 179 | + Newly allocated bytes are left uninitialized. |
| 180 | +
|
| 181 | + On success, return ``0``. |
| 182 | + On error, set an exception and return ``-1``. |
| 183 | +
|
| 184 | + *size* must be positive or zero. |
| 185 | +
|
| 186 | +.. c:function:: void* PyBytesWriter_GrowAndUpdatePointer(PyBytesWriter *writer, Py_ssize_t size, void *buf) |
| 187 | +
|
| 188 | + Similar to :c:func:`PyBytesWriter_Grow`, but update also the *buf* |
| 189 | + pointer. |
| 190 | +
|
| 191 | + On error, set an exception and return ``NULL``. |
| 192 | +
|
| 193 | + Pseudo-code:: |
| 194 | +
|
| 195 | + Py_ssize_t pos = (char*)buf - (char*)PyBytesWriter_GetData(writer); |
| 196 | + if (PyBytesWriter_Grow(writer, size) < 0) { |
| 197 | + return NULL; |
| 198 | + } |
| 199 | + return (char*)PyBytesWriter_GetData(writer) + pos; |
| 200 | +
|
| 201 | +
|
| 202 | +Overallocation |
| 203 | +-------------- |
| 204 | +
|
| 205 | +:c:func:`PyBytesWriter_Resize` and :c:func:`PyBytesWriter_Grow` |
| 206 | +overallocate the internal buffer to reduce the number of ``realloc()`` |
| 207 | +calls and so reduce memory copies. |
| 208 | +
|
| 209 | +
|
| 210 | +Thread safety |
| 211 | +------------- |
| 212 | +
|
| 213 | +The API is not thread safe: a writer should only be used by a single |
| 214 | +thread at the same time. |
| 215 | +
|
| 216 | +
|
| 217 | +Soft deprecations |
| 218 | +----------------- |
| 219 | +
|
| 220 | +Soft deprecate ``PyBytes_FromStringAndSize(NULL, size)`` and |
| 221 | +``_PyBytes_Resize()`` APIs. These APIs treat an immutable ``bytes`` |
| 222 | +object as a mutable object. They remain available and maintained, don't |
| 223 | +emit deprecation warning, but are no longer recommended when writing new |
| 224 | +code. |
| 225 | +
|
| 226 | +``PyBytes_FromStringAndSize(str, size)`` is not soft deprecated. Only |
| 227 | +calls with ``NULL`` *str* are soft deprecated. |
| 228 | +
|
| 229 | +
|
| 230 | +Examples |
| 231 | +======== |
| 232 | +
|
| 233 | +High-level API |
| 234 | +-------------- |
| 235 | +
|
| 236 | +Create the bytes string ``b"Hello World!"``:: |
| 237 | +
|
| 238 | + PyObject* hello_world(void) |
| 239 | + { |
| 240 | + PyBytesWriter *writer = PyBytesWriter_Create(0); |
| 241 | + if (writer == NULL) { |
| 242 | + goto error; |
| 243 | + } |
| 244 | + if (PyBytesWriter_WriteBytes(writer, "Hello", -1) < 0) { |
| 245 | + goto error; |
| 246 | + } |
| 247 | + if (PyBytesWriter_Format(writer, " %s!", "World") < 0) { |
| 248 | + goto error; |
| 249 | + } |
| 250 | + return PyBytesWriter_Finish(writer); |
| 251 | +
|
| 252 | + error: |
| 253 | + PyBytesWriter_Discard(writer); |
| 254 | + return NULL; |
| 255 | + } |
| 256 | +
|
| 257 | +
|
| 258 | +Create the bytes string "abc" |
| 259 | +----------------------------- |
| 260 | +
|
| 261 | +Example creating the bytes string ``b"abc"``, with a fixed size of 3 bytes:: |
| 262 | +
|
| 263 | + PyObject* create_abc(void) |
| 264 | + { |
| 265 | + PyBytesWriter *writer = PyBytesWriter_Create(3); |
| 266 | + if (writer == NULL) { |
| 267 | + return NULL; |
| 268 | + } |
| 269 | +
|
| 270 | + char *str = PyBytesWriter_GetData(writer); |
| 271 | + memcpy(str, "abc", 3); |
| 272 | + return PyBytesWriter_Finish(writer); |
| 273 | + } |
| 274 | +
|
| 275 | +GrowAndUpdatePointer() example |
| 276 | +------------------------------ |
| 277 | +
|
| 278 | +Example using a pointer to write bytes and to track the written size. |
| 279 | +
|
| 280 | +Create the bytes string ``b"Hello World"``:: |
| 281 | +
|
| 282 | + PyObject* grow_example(void) |
| 283 | + { |
| 284 | + // Allocate 10 bytes |
| 285 | + PyBytesWriter *writer = PyBytesWriter_Create(10); |
| 286 | + if (writer == NULL) { |
| 287 | + return NULL; |
| 288 | + } |
| 289 | +
|
| 290 | + // Write some bytes |
| 291 | + char *buf = PyBytesWriter_GetData(writer); |
| 292 | + memcpy(buf, "Hello ", strlen("Hello ")); |
| 293 | + buf += strlen("Hello "); |
| 294 | +
|
| 295 | + // Allocate 10 more bytes |
| 296 | + buf = PyBytesWriter_GrowAndUpdatePointer(writer, 10, buf); |
| 297 | + if (buf == NULL) { |
| 298 | + PyBytesWriter_Discard(writer); |
| 299 | + return NULL; |
| 300 | + } |
| 301 | +
|
| 302 | + // Write more bytes |
| 303 | + memcpy(buf, "World", strlen("World")); |
| 304 | + buf += strlen("World"); |
| 305 | +
|
| 306 | + // Truncate the string at 'buf' position |
| 307 | + // and create a bytes object |
| 308 | + return PyBytesWriter_FinishWithPointer(writer, buf); |
| 309 | + } |
| 310 | +
|
| 311 | +
|
| 312 | +Reference Implementation |
| 313 | +======================== |
| 314 | +
|
| 315 | +`Pull request gh-131681 <https://github.com/python/cpython/pull/131681>`__. |
| 316 | +
|
| 317 | +The implementation allocates internally a :class:`bytes` object, so |
| 318 | +:c:func:`PyBytesWriter_Finish` just returns the object without having |
| 319 | +to copy memory. |
| 320 | +
|
| 321 | +For strings up to 256 bytes, a small internal raw buffer of bytes is |
| 322 | +used. It avoids having to resize a :class:`bytes` object which is |
| 323 | +inefficient. At the end, :c:func:`PyBytesWriter_Finish` creates the |
| 324 | +:class:`bytes` object from this small buffer. |
| 325 | +
|
| 326 | +A free list is used to reduce the cost of allocating a |
| 327 | +:c:type:`PyBytesWriter` on the heap memory. |
| 328 | +
|
| 329 | +
|
| 330 | +Backwards Compatibility |
| 331 | +======================= |
| 332 | +
|
| 333 | +There is no impact on the backward compatibility, only new APIs are |
| 334 | +added. |
| 335 | +
|
| 336 | +
|
| 337 | +Prior Discussions |
| 338 | +================= |
| 339 | +
|
| 340 | +* March 2025: Third public API attempt, using size rather than pointers: |
| 341 | +
|
| 342 | + * `Discussion <https://discuss.python.org/t/81182/56>`_ |
| 343 | + * `Pull request gh-131681 <https://github.com/python/cpython/pull/131681>`__ |
| 344 | +
|
| 345 | +* February 2025: Second public API attempt: |
| 346 | +
|
| 347 | + * `Issue gh-129813 <https://github.com/python/cpython/issues/129813>`_ |
| 348 | + and |
| 349 | + `pull request gh-129814 |
| 350 | + <https://github.com/python/cpython/pull/129814>`_ |
| 351 | +
|
| 352 | +* July 2024: First public API attempt: |
| 353 | +
|
| 354 | + * C API Working Group decision: |
| 355 | + `Add PyBytes_Writer() API |
| 356 | + <https://github.com/capi-workgroup/decisions/issues/39>`_ |
| 357 | + (August 2024) |
| 358 | + * `Pull request gh-121726 |
| 359 | + <https://github.com/python/cpython/pull/121726>`_: |
| 360 | + first public API attempt (July 2024) |
| 361 | +
|
| 362 | +* March 2016: |
| 363 | + `Fast _PyAccu, _PyUnicodeWriter and _PyBytesWriter APIs to produce |
| 364 | + strings in CPython <https://vstinner.github.io/pybyteswriter.html>`_: |
| 365 | + Article on the original private ``_PyBytesWriter`` C API. |
| 366 | +
|
| 367 | +
|
| 368 | +Copyright |
| 369 | +========= |
| 370 | +
|
| 371 | +This document is placed in the public domain or under the |
| 372 | +CC0-1.0-Universal license, whichever is more permissive. |
0 commit comments