@@ -31,6 +31,12 @@ Unicode Type
3131These are the basic Unicode object types used for the Unicode implementation in
3232Python:
3333
34+ .. c :var :: PyTypeObject PyUnicode_Type 
35+ 
36+ :c:type: `PyTypeObject ` represents the Python Unicode type.  It
37+    is exposed to Python code as :py:class: `str `.
38+ 
39+ 
3440.. c :type :: Py_UCS4 
3541            Py_UCS2 
3642            Py_UCS1 
@@ -42,19 +48,6 @@ Python:
4248   .. versionadded :: 3.3 
4349
4450
45- .. c :type :: Py_UNICODE 
46- 
47- :c:type: `wchar_t `, which is a 16-bit type or 32-bit type
48-    depending on the platform.
49- 
50-    .. versionchanged :: 3.3 
51-       In previous versions, this was a 16-bit type or a 32-bit type depending on
52-       whether you selected a "narrow" or "wide" Unicode version of Python at
53-       build time.
54- 
55-    .. deprecated-removed :: 3.13 3.15 
56- 
57- 
5851.. c :type :: PyASCIIObject 
5952            PyCompactUnicodeObject 
6053            PyUnicodeObject 
@@ -66,12 +59,6 @@ Python:
6659   .. versionadded :: 3.3 
6760
6861
69- .. c :var :: PyTypeObject PyUnicode_Type 
70- 
71- :c:type: `PyTypeObject ` represents the Python Unicode type.  It
72-    is exposed to Python code as ``str ``.
73- 
74- 
7562The following APIs are C macros and static inlined functions for fast checks and
7663access to internal read-only data of Unicode objects:
7764
@@ -87,16 +74,6 @@ access to internal read-only data of Unicode objects:
8774   subtype.  This function always succeeds. 
8875
8976
90- .. c :function :: int  PyUnicode_READY (PyObject *unicode)   
91- 
92-    Returns ``0 ``. This API is kept only for backward compatibility. 
93- 
94-    .. versionadded :: 3.3  
95- 
96-    .. deprecated :: 3.10  
97-       This API does nothing since Python 3.12. 
98- 
99- 
10077.. c :function :: Py_ssize_t PyUnicode_GET_LENGTH (PyObject *unicode)   
10178
10279   Return the length of the Unicode string, in code points.  *unicode * has to be a 
@@ -149,12 +126,16 @@ access to internal read-only data of Unicode objects:
149126.. c:function:: void PyUnicode_WRITE(int kind, void *data, \  
150127                                     Py_ssize_t index, Py_UCS4 value) 
151128
152-    Write into a canonical representation *data * (as obtained with 
153-    :c:func: `PyUnicode_DATA `).  This function performs no sanity checks, and is 
154-    intended for usage in loops.  The caller should cache the *kind* value and 
155-    *data* pointer as obtained from other calls.  *index* is the index in 
156-    the string (starts at 0) and *value* is the new code point value which should 
157-    be written to that location. 
129+    Write the code point *value * to the given zero-based *index * in a string. 
130+ 
131+    The *kind * value and *data * pointer must have been obtained from a 
132+    string using :c:func: `PyUnicode_KIND ` and :c:func: `PyUnicode_DATA ` 
133+    respectively. You must hold a reference to that string while calling 
134+    :c:func: `!PyUnicode_WRITE `. All requirements of 
135+    :c:func: `PyUnicode_WriteChar ` also apply. 
136+ 
137+    The function performs no checks for any of its requirements, 
138+    and is intended for usage in loops. 
158139
159140   .. versionadded :: 3.3  
160141
@@ -196,6 +177,14 @@ access to internal read-only data of Unicode objects:
196177      is not ready. 
197178
198179
180+ .. c :function :: unsigned  int  PyUnicode_IS_ASCII (PyObject *unicode)   
181+ 
182+    Return true if the string only contains ASCII characters. 
183+    Equivalent to :py:meth: `str.isascii `. 
184+ 
185+    .. versionadded :: 3.2  
186+ 
187+ 
199188Unicode Character Properties 
200189"""""""""""""""""""""""""""" 
201190
@@ -330,11 +319,29 @@ APIs:
330319   to be placed in the string.  As an approximation, it can be rounded up to the 
331320   nearest value in the sequence 127, 255, 65535, 1114111. 
332321
333-    This is the recommended way to allocate a new Unicode object.  Objects 
334-    created using this function are not resizable. 
335- 
336322   On error, set an exception and return ``NULL``. 
337323
324+    After creation, the string can be filled by :c:func:`PyUnicode_WriteChar`, 
325+    :c:func:`PyUnicode_CopyCharacters`, :c:func:`PyUnicode_Fill`, 
326+    :c:func:`PyUnicode_WRITE` or similar. 
327+    Since strings are supposed to be immutable, take care to not “use” the 
328+    result while it is being modified. In particular, before it's filled 
329+    with its final  contents, a string: 
330+ 
331+    - must not be hashed, 
332+    - must not be :c:func:`converted to UTF-8 <PyUnicode_AsUTF8AndSize>`, 
333+      or another non-"canonical" representation, 
334+    - must not have its reference count changed, 
335+    - must not be shared with code that might do one of the above. 
336+ 
337+    This list is not exhaustive. Avoiding these uses is your responsibility; 
338+ 
339+ 
340+    To avoid accidentally exposing a partially-written string object, prefer 
341+    using the :c:type: `PyUnicodeWriter ` API, or one of the ``PyUnicode_From* `` 
342+    functions below. 
343+ 
344+ 
338345   .. versionadded :: 3.3  
339346
340347
@@ -607,6 +614,15 @@ APIs:
607614   decref'ing the returned objects. 
608615
609616
617+ .. c:function:: const  char* PyUnicode_GetDefaultEncoding(void) 
618+ 
619+    Return the name of the default string encoding, ``"utf-8"``. 
620+    See :func:`sys.getdefaultencoding`. 
621+ 
622+    The returned string does not need to be freed, and is valid 
623+    until interpreter shutdown. 
624+ 
625+ 
610626.. c:function:: Py_ssize_t PyUnicode_GetLength(PyObject *unicode)  
611627
612628   Return the length of the Unicode object, in code points. 
@@ -627,6 +643,9 @@ APIs:
627643   possible.  Returns ``-1 `` and sets an exception on error, otherwise returns 
628644   the number of copied characters. 
629645
646+    The string must not have been “used” yet. 
647+    See :c:func: `PyUnicode_New ` for details. 
648+ 
630649   .. versionadded :: 3.3  
631650
632651
@@ -639,6 +658,9 @@ APIs:
639658   Fail if *fill_char * is bigger than the string maximum character, or if the 
640659   string has more than 1 reference. 
641660
661+    The string must not have been “used” yet. 
662+    See :c:func: `PyUnicode_New ` for details. 
663+ 
642664   Return the number of written character, or return ``-1 `` and raise an 
643665   exception on error. 
644666
@@ -648,15 +670,16 @@ APIs:
648670.. c :function :: int  PyUnicode_WriteChar (PyObject *unicode, Py_ssize_t index, \   
649671                                        Py_UCS4 character) 
650672
651-    Write a character to a string.  The string must have been created through 
652-    :c:func: `PyUnicode_New `.  Since Unicode strings are supposed to be immutable, 
653-    the string must not be shared, or have been hashed yet. 
673+    Write a *character * to the string *unicode * at the zero-based *index *. 
674+    Return ``0 `` on success, ``-1 `` on error with an exception set. 
654675
655676   This function checks that *unicode * is a Unicode object, that the index is 
656-    not out of bounds, and that the object can be modified safely (i.e. that it 
657-    its reference count is one). 
677+    not out of bounds, and that the object's reference count is one). 
678+    See :c:func:`PyUnicode_WRITE` for a version that skips these checks, 
679+    making them your responsibility. 
658680
659-    Return ``0`` on success, ``-1`` on error with an exception set. 
681+    The string must not have been “used” yet. 
682+    See :c:func:`PyUnicode_New` for details. 
660683
661684   .. versionadded:: 3.3 
662685
@@ -1640,6 +1663,20 @@ They all return ``NULL`` or ``-1`` if an exception occurs.
16401663      Strings interned this way are made :term:`immortal`. 
16411664
16421665
1666+ .. c:function:: unsigned int PyUnicode_CHECK_INTERNED(PyObject *str)  
1667+ 
1668+    Return a non-zero value if *str * is interned, zero if not. 
1669+    The *str * argument must be a string; this is not checked. 
1670+    This function always succeeds. 
1671+ 
1672+    .. impl-detail :: 
1673+ 
1674+       A non-zero return value may carry additional information 
1675+       about *how * the string is interned. 
1676+       The meaning of such non-zero values, as well as each specific string's 
1677+       intern-related details, may change between CPython versions. 
1678+ 
1679+ 
16431680PyUnicodeWriter 
16441681^^^^^^^^^^^^^^^ 
16451682
@@ -1760,8 +1797,8 @@ object.
17601797   *size * is the string length in bytes. If *size * is equal to ``-1 ``, call 
17611798   ``strlen(str) `` to get the string length. 
17621799
1763-    *errors * is an error handler name, such as `` "replace" ``. If * errors * is  
1764-    ``NULL ``, use the strict error handler. 
1800+    *errors * is an :ref: ` error handler  < error-handlers >`  name, such as 
1801+    ``"replace" ``. If * errors * is `` NULL ``, use the strict error handler. 
17651802
17661803   If *consumed * is not ``NULL ``, set *\* consumed 
17671804   bytes on success. 
@@ -1772,3 +1809,49 @@ object.
17721809   On error, set an exception, leave the writer unchanged, and return ``-1 ``. 
17731810
17741811   See also :c:func: `PyUnicodeWriter_WriteUTF8 `. 
1812+ 
1813+ Deprecated API 
1814+ ^^^^^^^^^^^^^^ 
1815+ 
1816+ The following API is deprecated. 
1817+ 
1818+ .. c :type :: Py_UNICODE  
1819+ 
1820+ :c:type: `wchar_t `, which is a 16-bit type or 32-bit type
1821+    depending on the platform. 
1822+    Please use :c:type: `wchar_t ` directly instead. 
1823+ 
1824+    .. versionchanged :: 3.3  
1825+       In previous versions, this was a 16-bit type or a 32-bit type depending on 
1826+       whether you selected a "narrow" or "wide" Unicode version of Python at 
1827+       build time. 
1828+ 
1829+    .. deprecated-removed :: 3.13 3.15  
1830+ 
1831+ 
1832+ .. c :function :: int  PyUnicode_READY (PyObject *unicode)   
1833+ 
1834+    Do nothing and return ``0 ``. 
1835+    This API is kept only for backward compatibility, but there are no plans 
1836+    to remove it. 
1837+ 
1838+    .. versionadded :: 3.3  
1839+ 
1840+    .. deprecated :: 3.10  
1841+       This API does nothing since Python 3.12. 
1842+       Previously, this needed to be called for each string created using 
1843+       the old API (:c:func: `!PyUnicode_FromUnicode ` or similar). 
1844+ 
1845+ 
1846+ .. c:function:: unsigned int PyUnicode_IS_READY(PyObject *unicode)  
1847+ 
1848+    Do nothing and return ``1 ``. 
1849+    This API is kept only for backward compatibility, but there are no plans 
1850+    to remove it. 
1851+ 
1852+    .. versionadded :: 3.3  
1853+ 
1854+    .. deprecated :: next  
1855+       This API does nothing since Python 3.12. 
1856+       Previously, this could be called to check if 
1857+       :c:func: `PyUnicode_READY ` is necessary. 
0 commit comments