@@ -21,9 +21,9 @@ Add functions to the limited C API version 3.14:
21
21
view.
22
22
* ``PyUnicode_Import() ``: import a Python str object.
23
23
24
- In general , ``PyUnicode_Export() `` has an *O *\ (1) complexity: no memory
25
- copy is needed . See the :ref: `specification <export-complexity >` for
26
- cases when a copy is needed.
24
+ By default , ``PyUnicode_Export() `` has an *O *\ (1) complexity: no memory
25
+ is copied . See the :ref: `specification <export-complexity >` for cases
26
+ when a copy is needed.
27
27
28
28
29
29
Rationale
@@ -95,6 +95,8 @@ Add the following API to the limited C API version 3.14::
95
95
#define PyUnicode_FORMAT_UTF8 0x08 // char*
96
96
#define PyUnicode_FORMAT_ASCII 0x10 // char* (ASCII string)
97
97
98
+ #define PyUnicode_EXPORT_ALLOW_COPY 0x10000
99
+
98
100
The ``int32_t `` type is used instead of ``int `` to have a well defined
99
101
type size and not depend on the platform or the compiler.
100
102
See `Avoid C-specific Types
@@ -150,18 +152,41 @@ flags.
150
152
151
153
Note that future versions of Python may introduce additional formats.
152
154
155
+ By default, no memory is copied and no conversion is done.
156
+
157
+ If the ``PyUnicode_EXPORT_ALLOW_COPY `` flag is set in
158
+ *requested_formats *, the function can copy memory to provide the
159
+ requested format and convert from a format to another.
160
+
161
+ The ``PyUnicode_EXPORT_ALLOW_COPY `` flag is needed to export to
162
+ ``PyUnicode_FORMAT_UTF8 `` a string containing surrogate characters.
163
+
164
+ Available flags:
165
+
166
+ =============================== =========== ===================================
167
+ Flag Value Description
168
+ =============================== =========== ===================================
169
+ ``PyUnicode_EXPORT_ALLOW_COPY `` ``0x10000 `` Allow memory copies and conversions
170
+ =============================== =========== ===================================
171
+
172
+
153
173
.. _export-complexity :
154
174
155
175
Export complexity
156
176
-----------------
157
177
158
- In general, an export has a complexity of *O *\ (1): no memory copy is
159
- needed. There are cases when a copy is needed, *O *\ (*n *) complexity:
178
+ By default, an export has a complexity of *O *\ (1): no memory is copied
179
+ and no conversion is done. There is an exception: if only UTF-8 is
180
+ requested and the UTF-8 cache is not filled, the string is encoded to
181
+ UTF-8 to fill the cache.
182
+
183
+ If the ``PyUnicode_EXPORT_ALLOW_COPY `` flag is set, there are cases when a
184
+ copy is needed, *O *\ (*n *) complexity:
160
185
161
186
* If only UCS-2 is requested and the native format is UCS-1.
162
187
* If only UCS-4 is requested and the native format is UCS-1 or UCS-2.
163
- * If only UTF-8 is requested: the string is encoded to UTF-8 at the
164
- first call, and then the encoded UTF-8 string is cached .
188
+ * If only UTF-8 is requested and the string contains surrogate
189
+ characters .
165
190
166
191
To get the best performance on CPython and PyPy, it's recommended to
167
192
support these 4 formats::
@@ -236,8 +261,8 @@ The ``PyUnicode_FORMAT_ASCII`` format is mostly useful for
236
261
characters.
237
262
238
263
239
- Surrogate characters and NUL characters
240
- ---------------------------------------
264
+ Surrogate characters and embedded NUL characters
265
+ ------------------------------------------------
241
266
242
267
Surrogate characters are allowed: they can be imported and exported. For
243
268
example, the UTF-8 format uses the ``surrogatepass `` error handler.
@@ -347,6 +372,7 @@ to return NULL on embedded null characters
347
372
Rejecting embedded NUL characters require to scan the string which has
348
373
an *O *\ (*n *) complexity.
349
374
375
+
350
376
Reject surrogate characters
351
377
---------------------------
352
378
0 commit comments