Skip to content

Commit f085d19

Browse files
authored
PEP 756: Add PyUnicode_EXPORT_ALLOW_COPY flag (#3988)
1 parent 680c8b1 commit f085d19

File tree

1 file changed

+35
-9
lines changed

1 file changed

+35
-9
lines changed

peps/pep-0756.rst

Lines changed: 35 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ Add functions to the limited C API version 3.14:
2121
view.
2222
* ``PyUnicode_Import()``: import a Python str object.
2323

24-
In general, ``PyUnicode_Export()`` has an *O*\ (1) complexity: no memory
25-
copy is needed. See the :ref:`specification <export-complexity>` for
26-
cases when a copy is needed.
24+
By default, ``PyUnicode_Export()`` has an *O*\ (1) complexity: no memory
25+
is copied. See the :ref:`specification <export-complexity>` for cases
26+
when a copy is needed.
2727

2828

2929
Rationale
@@ -95,6 +95,8 @@ Add the following API to the limited C API version 3.14::
9595
#define PyUnicode_FORMAT_UTF8 0x08 // char*
9696
#define PyUnicode_FORMAT_ASCII 0x10 // char* (ASCII string)
9797

98+
#define PyUnicode_EXPORT_ALLOW_COPY 0x10000
99+
98100
The ``int32_t`` type is used instead of ``int`` to have a well defined
99101
type size and not depend on the platform or the compiler.
100102
See `Avoid C-specific Types
@@ -150,18 +152,41 @@ flags.
150152

151153
Note that future versions of Python may introduce additional formats.
152154

155+
By default, no memory is copied and no conversion is done.
156+
157+
If the ``PyUnicode_EXPORT_ALLOW_COPY`` flag is set in
158+
*requested_formats*, the function can copy memory to provide the
159+
requested format and convert from a format to another.
160+
161+
The ``PyUnicode_EXPORT_ALLOW_COPY`` flag is needed to export to
162+
``PyUnicode_FORMAT_UTF8`` a string containing surrogate characters.
163+
164+
Available flags:
165+
166+
=============================== =========== ===================================
167+
Flag Value Description
168+
=============================== =========== ===================================
169+
``PyUnicode_EXPORT_ALLOW_COPY`` ``0x10000`` Allow memory copies and conversions
170+
=============================== =========== ===================================
171+
172+
153173
.. _export-complexity:
154174

155175
Export complexity
156176
-----------------
157177

158-
In general, an export has a complexity of *O*\ (1): no memory copy is
159-
needed. There are cases when a copy is needed, *O*\ (*n*) complexity:
178+
By default, an export has a complexity of *O*\ (1): no memory is copied
179+
and no conversion is done. There is an exception: if only UTF-8 is
180+
requested and the UTF-8 cache is not filled, the string is encoded to
181+
UTF-8 to fill the cache.
182+
183+
If the ``PyUnicode_EXPORT_ALLOW_COPY`` flag is set, there are cases when a
184+
copy is needed, *O*\ (*n*) complexity:
160185

161186
* If only UCS-2 is requested and the native format is UCS-1.
162187
* If only UCS-4 is requested and the native format is UCS-1 or UCS-2.
163-
* If only UTF-8 is requested: the string is encoded to UTF-8 at the
164-
first call, and then the encoded UTF-8 string is cached.
188+
* If only UTF-8 is requested and the string contains surrogate
189+
characters.
165190

166191
To get the best performance on CPython and PyPy, it's recommended to
167192
support these 4 formats::
@@ -236,8 +261,8 @@ The ``PyUnicode_FORMAT_ASCII`` format is mostly useful for
236261
characters.
237262

238263

239-
Surrogate characters and NUL characters
240-
---------------------------------------
264+
Surrogate characters and embedded NUL characters
265+
------------------------------------------------
241266

242267
Surrogate characters are allowed: they can be imported and exported. For
243268
example, the UTF-8 format uses the ``surrogatepass`` error handler.
@@ -347,6 +372,7 @@ to return NULL on embedded null characters
347372
Rejecting embedded NUL characters require to scan the string which has
348373
an *O*\ (*n*) complexity.
349374

375+
350376
Reject surrogate characters
351377
---------------------------
352378

0 commit comments

Comments
 (0)