@@ -31,6 +31,12 @@ Unicode Type
3131These are the basic Unicode object types used for the Unicode implementation in
3232Python:
3333
34+ .. c :var :: PyTypeObject PyUnicode_Type
35+
36+ This instance of :c:type: `PyTypeObject ` represents the Python Unicode type. It
37+ is exposed to Python code as :py:class: `str `.
38+
39+
3440.. c :type :: Py_UCS4
3541 Py_UCS2
3642 Py_UCS1
@@ -42,19 +48,6 @@ Python:
4248 .. versionadded :: 3.3
4349
4450
45- .. c :type :: Py_UNICODE
46-
47- This is a typedef of :c:type: `wchar_t `, which is a 16-bit type or 32-bit type
48- depending on the platform.
49-
50- .. versionchanged :: 3.3
51- In previous versions, this was a 16-bit type or a 32-bit type depending on
52- whether you selected a "narrow" or "wide" Unicode version of Python at
53- build time.
54-
55- .. deprecated-removed :: 3.13 3.15
56-
57-
5851.. c :type :: PyASCIIObject
5952 PyCompactUnicodeObject
6053 PyUnicodeObject
@@ -66,12 +59,6 @@ Python:
6659 .. versionadded :: 3.3
6760
6861
69- .. c :var :: PyTypeObject PyUnicode_Type
70-
71- This instance of :c:type: `PyTypeObject ` represents the Python Unicode type. It
72- is exposed to Python code as ``str ``.
73-
74-
7562The following APIs are C macros and static inlined functions for fast checks and
7663access to internal read-only data of Unicode objects:
7764
@@ -87,16 +74,6 @@ access to internal read-only data of Unicode objects:
8774 subtype. This function always succeeds.
8875
8976
90- .. c :function :: int PyUnicode_READY (PyObject *unicode)
91-
92- Returns ``0 ``. This API is kept only for backward compatibility.
93-
94- .. versionadded :: 3.3
95-
96- .. deprecated :: 3.10
97- This API does nothing since Python 3.12.
98-
99-
10077.. c :function :: Py_ssize_t PyUnicode_GET_LENGTH (PyObject *unicode)
10178
10279 Return the length of the Unicode string, in code points. *unicode * has to be a
@@ -149,12 +126,16 @@ access to internal read-only data of Unicode objects:
149126.. c:function:: void PyUnicode_WRITE(int kind, void *data, \
150127 Py_ssize_t index, Py_UCS4 value)
151128
152- Write into a canonical representation *data * (as obtained with
153- :c:func: `PyUnicode_DATA `). This function performs no sanity checks, and is
154- intended for usage in loops. The caller should cache the *kind* value and
155- *data* pointer as obtained from other calls. *index* is the index in
156- the string (starts at 0) and *value* is the new code point value which should
157- be written to that location.
129+ Write the code point *value * to the given zero-based *index * in a string.
130+
131+ The *kind * value and *data * pointer must have been obtained from a
132+ string using :c:func: `PyUnicode_KIND ` and :c:func: `PyUnicode_DATA `
133+ respectively. You must hold a reference to that string while calling
134+ :c:func: `!PyUnicode_WRITE `. All requirements of
135+ :c:func: `PyUnicode_WriteChar ` also apply.
136+
137+ The function performs no checks for any of its requirements,
138+ and is intended for usage in loops.
158139
159140 .. versionadded :: 3.3
160141
@@ -196,6 +177,14 @@ access to internal read-only data of Unicode objects:
196177 is not ready.
197178
198179
180+ .. c :function :: unsigned int PyUnicode_IS_ASCII (PyObject *unicode)
181+
182+ Return true if the string only contains ASCII characters.
183+ Equivalent to :py:meth: `str.isascii `.
184+
185+ .. versionadded :: 3.2
186+
187+
199188Unicode Character Properties
200189""""""""""""""""""""""""""""
201190
@@ -330,11 +319,29 @@ APIs:
330319 to be placed in the string. As an approximation, it can be rounded up to the
331320 nearest value in the sequence 127, 255, 65535, 1114111.
332321
333- This is the recommended way to allocate a new Unicode object. Objects
334- created using this function are not resizable.
335-
336322 On error, set an exception and return ``NULL``.
337323
324+ After creation, the string can be filled by :c:func:`PyUnicode_WriteChar`,
325+ :c:func:`PyUnicode_CopyCharacters`, :c:func:`PyUnicode_Fill`,
326+ :c:func:`PyUnicode_WRITE` or similar.
327+ Since strings are supposed to be immutable, take care to not “use” the
328+ result while it is being modified. In particular, before it's filled
329+ with its final contents, a string:
330+
331+ - must not be hashed,
332+ - must not be :c:func:`converted to UTF-8 <PyUnicode_AsUTF8AndSize>`,
333+ or another non-"canonical" representation,
334+ - must not have its reference count changed,
335+ - must not be shared with code that might do one of the above.
336+
337+ This list is not exhaustive. Avoiding these uses is your responsibility;
338+ Python does not always check these requirements.
339+
340+ To avoid accidentally exposing a partially-written string object, prefer
341+ using the :c:type: `PyUnicodeWriter ` API, or one of the ``PyUnicode_From* ``
342+ functions below.
343+
344+
338345 .. versionadded :: 3.3
339346
340347
@@ -607,6 +614,15 @@ APIs:
607614 decref'ing the returned objects.
608615
609616
617+ .. c:function:: const char* PyUnicode_GetDefaultEncoding(void)
618+
619+ Return the name of the default string encoding, ``"utf-8"``.
620+ See :func:`sys.getdefaultencoding`.
621+
622+ The returned string does not need to be freed, and is valid
623+ until interpreter shutdown.
624+
625+
610626.. c:function:: Py_ssize_t PyUnicode_GetLength(PyObject *unicode)
611627
612628 Return the length of the Unicode object, in code points.
@@ -627,6 +643,9 @@ APIs:
627643 possible. Returns ``-1 `` and sets an exception on error, otherwise returns
628644 the number of copied characters.
629645
646+ The string must not have been “used” yet.
647+ See :c:func: `PyUnicode_New ` for details.
648+
630649 .. versionadded :: 3.3
631650
632651
@@ -639,6 +658,9 @@ APIs:
639658 Fail if *fill_char * is bigger than the string maximum character, or if the
640659 string has more than 1 reference.
641660
661+ The string must not have been “used” yet.
662+ See :c:func: `PyUnicode_New ` for details.
663+
642664 Return the number of written character, or return ``-1 `` and raise an
643665 exception on error.
644666
@@ -648,15 +670,16 @@ APIs:
648670.. c :function :: int PyUnicode_WriteChar (PyObject *unicode, Py_ssize_t index, \
649671 Py_UCS4 character)
650672
651- Write a character to a string. The string must have been created through
652- :c:func: `PyUnicode_New `. Since Unicode strings are supposed to be immutable,
653- the string must not be shared, or have been hashed yet.
673+ Write a *character * to the string *unicode * at the zero-based *index *.
674+ Return ``0 `` on success, ``-1 `` on error with an exception set.
654675
655676 This function checks that *unicode * is a Unicode object, that the index is
656- not out of bounds, and that the object can be modified safely (i.e. that it
657- its reference count is one).
677+ not out of bounds, and that the object's reference count is one).
678+ See :c:func:`PyUnicode_WRITE` for a version that skips these checks,
679+ making them your responsibility.
658680
659- Return ``0`` on success, ``-1`` on error with an exception set.
681+ The string must not have been “used” yet.
682+ See :c:func:`PyUnicode_New` for details.
660683
661684 .. versionadded:: 3.3
662685
@@ -1640,6 +1663,20 @@ They all return ``NULL`` or ``-1`` if an exception occurs.
16401663 Strings interned this way are made :term:`immortal`.
16411664
16421665
1666+ .. c:function:: unsigned int PyUnicode_CHECK_INTERNED(PyObject *str)
1667+
1668+ Return a non-zero value if *str * is interned, zero if not.
1669+ The *str * argument must be a string; this is not checked.
1670+ This function always succeeds.
1671+
1672+ .. impl-detail ::
1673+
1674+ A non-zero return value may carry additional information
1675+ about *how * the string is interned.
1676+ The meaning of such non-zero values, as well as each specific string's
1677+ intern-related details, may change between CPython versions.
1678+
1679+
16431680PyUnicodeWriter
16441681^^^^^^^^^^^^^^^
16451682
@@ -1760,8 +1797,8 @@ object.
17601797 *size * is the string length in bytes. If *size * is equal to ``-1 ``, call
17611798 ``strlen(str) `` to get the string length.
17621799
1763- *errors * is an error handler name, such as `` "replace" ``. If * errors * is
1764- ``NULL ``, use the strict error handler.
1800+ *errors * is an :ref: ` error handler < error-handlers >` name, such as
1801+ ``"replace" ``. If * errors * is `` NULL ``, use the strict error handler.
17651802
17661803 If *consumed * is not ``NULL ``, set *\* consumed * to the number of decoded
17671804 bytes on success.
@@ -1772,3 +1809,49 @@ object.
17721809 On error, set an exception, leave the writer unchanged, and return ``-1 ``.
17731810
17741811 See also :c:func: `PyUnicodeWriter_WriteUTF8 `.
1812+
1813+ Deprecated API
1814+ ^^^^^^^^^^^^^^
1815+
1816+ The following API is deprecated.
1817+
1818+ .. c :type :: Py_UNICODE
1819+
1820+ This is a typedef of :c:type: `wchar_t `, which is a 16-bit type or 32-bit type
1821+ depending on the platform.
1822+ Please use :c:type: `wchar_t ` directly instead.
1823+
1824+ .. versionchanged :: 3.3
1825+ In previous versions, this was a 16-bit type or a 32-bit type depending on
1826+ whether you selected a "narrow" or "wide" Unicode version of Python at
1827+ build time.
1828+
1829+ .. deprecated-removed :: 3.13 3.15
1830+
1831+
1832+ .. c :function :: int PyUnicode_READY (PyObject *unicode)
1833+
1834+ Do nothing and return ``0 ``.
1835+ This API is kept only for backward compatibility, but there are no plans
1836+ to remove it.
1837+
1838+ .. versionadded :: 3.3
1839+
1840+ .. deprecated :: 3.10
1841+ This API does nothing since Python 3.12.
1842+ Previously, this needed to be called for each string created using
1843+ the old API (:c:func: `!PyUnicode_FromUnicode ` or similar).
1844+
1845+
1846+ .. c:function:: unsigned int PyUnicode_IS_READY(PyObject *unicode)
1847+
1848+ Do nothing and return ``1 ``.
1849+ This API is kept only for backward compatibility, but there are no plans
1850+ to remove it.
1851+
1852+ .. versionadded :: 3.3
1853+
1854+ .. deprecated :: next
1855+ This API does nothing since Python 3.12.
1856+ Previously, this could be called to check if
1857+ :c:func: `PyUnicode_READY ` is necessary.
0 commit comments