@@ -102,7 +102,12 @@ longer rationale.
102
102
PyUnicode_Export()
103
103
------------------
104
104
105
- API: ``int32_t PyUnicode_Export(PyObject *unicode, int32_t requested_formats, Py_buffer *view) ``.
105
+ API::
106
+
107
+ int32_t PyUnicode_Export(
108
+ PyObject *unicode,
109
+ int32_t requested_formats,
110
+ Py_buffer *view)
106
111
107
112
Export the contents of the *unicode * string in one of the *requested_formats *.
108
113
@@ -116,6 +121,10 @@ The contents of the buffer are valid until they are released.
116
121
117
122
The buffer is read-only and must not be modified.
118
123
124
+ The ``view->len `` member must be used to get the string length. The
125
+ buffer should end with a trailing NUL character, but it's not
126
+ recommended to rely on that because of embedded NUL characters.
127
+
119
128
*unicode * and *view * must not be NULL.
120
129
121
130
Available formats:
@@ -152,14 +161,18 @@ needed. There are cases when a copy is needed, *O*\ (*n*) complexity:
152
161
* If only UTF-8 is requested: the string is encoded to UTF-8 at the
153
162
first call, and then the encoded UTF-8 string is cached.
154
163
155
- To have an * O * \ (1) complexity on CPython and PyPy, it's recommended to
164
+ To get the best performance on CPython and PyPy, it's recommended to
156
165
support these 4 formats::
157
166
158
167
(PyUnicode_FORMAT_UCS1 \
159
168
| PyUnicode_FORMAT_UCS2 \
160
169
| PyUnicode_FORMAT_UCS4 \
161
170
| PyUnicode_FORMAT_UTF8)
162
171
172
+ PyPy uses UTF-8 natively and so the ``PyUnicode_FORMAT_UTF8 `` format is
173
+ recommended. It requires a memory copy, since PyPy ``str `` objects can
174
+ be moved in memory (PyPy uses a moving garbage collector).
175
+
163
176
164
177
Py_buffer format and item size
165
178
------------------------------
@@ -181,7 +194,12 @@ Export format Buffer format Item size
181
194
PyUnicode_Import()
182
195
------------------
183
196
184
- API: ``PyObject* PyUnicode_Import(const void *data, Py_ssize_t nbytes, int32_t format) ``.
197
+ API::
198
+
199
+ PyObject* PyUnicode_Import(
200
+ const void *data,
201
+ Py_ssize_t nbytes,
202
+ int32_t format)
185
203
186
204
Create a Unicode string object from a buffer in a supported format.
187
205
@@ -224,10 +242,6 @@ example, the UTF-8 format uses the ``surrogatepass`` error handler.
224
242
225
243
Embedded NUL characters are allowed: they can be imported and exported.
226
244
227
- An exported string does not end with a trailing NUL character: the
228
- ``PyUnicode_Export() `` caller must use ``Py_buffer.len `` to get the
229
- string length.
230
-
231
245
232
246
Implementation
233
247
==============
@@ -242,19 +256,6 @@ There is no impact on the backward compatibility, only new C API
242
256
functions are added.
243
257
244
258
245
- Open Questions
246
- ==============
247
-
248
- * Should we guarantee that the exported buffer always ends with a NUL
249
- character? Is it possible to implement it in *O *\ (1) complexity
250
- in all Python implementations?
251
- * Is it ok to allow surrogate characters?
252
- * Should we add a flag to disallow embedded NUL characters? It would
253
- have an *O *\ (*n *) complexity.
254
- * Should we add a flag to disallow surrogate characters? It would
255
- have an *O *\ (*n *) complexity.
256
-
257
-
258
259
Usage of PEP 393 C APIs
259
260
=======================
260
261
0 commit comments