-
-
Notifications
You must be signed in to change notification settings - Fork 33k
gh-76535: Add PyUCS4_ToLower() function #139333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
vstinner
wants to merge
19
commits into
python:main
Choose a base branch
from
vstinner:unicode_tolower2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 16 commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
5c44aca
gh-76535: Make `PyUnicode_ToLowerFull` and friends public
lysnikolaou b8384ae
Address feedback; add size parameter and do PyUnicode_ToFolded as well
lysnikolaou ee4b707
📜🤖 Added by blurb_it.
blurb-it[bot] 82d0bcb
Address more feedback; assert return value and raise ValueError
lysnikolaou 10c282d
Add tests
lysnikolaou d7ed172
Document the maximum numbers of characters needed in the buffer
lysnikolaou 25f1cd8
Address feedback; test more characters and refactor _testcapi functions
lysnikolaou 5979fdb
Address more review comments
lysnikolaou 769d84a
Disallow passing NULL
lysnikolaou 625ad47
Only return NULL when chars < 0 in C test functions
lysnikolaou 3008eb6
Use Py_ssize_t and don't check overflow in loop
lysnikolaou 4163898
Use Py_ssize_t for return value variable in unicodeobject.c
lysnikolaou ce6a3a6
Address feedback; Rename to PyUCS4_*, define macro and test small buf…
lysnikolaou 3c96475
Address feedback
lysnikolaou ef8264c
Replace Py_UCS4 with (const Py_UCS4*, Py_ssize_t)
vstinner 10c164a
closes gh-138706: update Unicode to 17.0.0 (#138719)
benjaminp c67a22d
Update Tools/unicode/makeunicodedata.py
vstinner e0afd1d
Apply suggestions from code review
vstinner 01e13e6
Update Modules/_testcapi/unicode.c
vstinner File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -307,6 +307,74 @@ These APIs can be used for fast direct character conversions: | |
possible. This function does not raise exceptions. | ||
|
||
|
||
.. c:function:: Py_ssize_t PyUCS4_ToLower(const Py_UCS4 *str, Py_ssize_t str_size, Py_UCS4 *buffer, Py_ssize_t buf_size) | ||
|
||
Convert *str* characters to lower case, store result in *buffer*, which should be | ||
able to hold as many characters needed for *str* to be lower cased, and | ||
return the number of characters stored. If at some point a buffer overflow | ||
is detected, an :exc:`ValueError` is raised and ``-1`` is returned. | ||
|
||
*str_size*, *buf_size* and the result are number of UCS-4 characters. | ||
vstinner marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
In Unicode 16.0, any character can be lowercased into a buffer of *buf_size* ``2``. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this (specifying 16.0.0) may be confusing for readers since we compile UCD version 17.0 in unicodedata. |
||
See also :c:macro:`PyUCS4_CASE_CONVERSION_BUFFER_SIZE`. | ||
|
||
.. versionadded:: next | ||
|
||
|
||
.. c:function:: Py_ssize_t PyUCS4_ToUpper(const Py_UCS4 *str, Py_ssize_t str_size, Py_UCS4 *buffer, Py_ssize_t buf_size) | ||
|
||
Convert *str* characters to upper case, store result in *buffer*, which should be | ||
able to hold as many characters needed for *str* to be upper cased, and | ||
return the number of characters stored. If at some point a buffer overflow | ||
is detected, an :exc:`ValueError` is raised and ``-1`` is returned. | ||
vstinner marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
*str_size*, *buf_size* and the result are number of UCS-4 characters. | ||
|
||
In Unicode 16.0, any character can be uppercased into a buffer of *buf_size* ``3``. | ||
See also :c:macro:`PyUCS4_CASE_CONVERSION_BUFFER_SIZE`. | ||
|
||
.. versionadded:: next | ||
|
||
|
||
.. c:function:: Py_ssize_t PyUCS4_ToTitle(const Py_UCS4 *str, Py_ssize_t str_size, Py_UCS4 *buffer, Py_ssize_t buf_size) | ||
|
||
Convert *str* characters to title case, store result in *buffer*, which should be | ||
able to hold as many characters needed for *str* to be title cased, and | ||
return the number of characters stored. If at some point a buffer overflow | ||
is detected, an :exc:`ValueError` is raised and ``-1`` is returned. | ||
vstinner marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
*str_size*, *buf_size* and the result are number of UCS-4 characters. | ||
|
||
In Unicode 16.0, any character can be titlecased into a buffer of *buf_size* ``3``. | ||
See also :c:macro:`PyUCS4_CASE_CONVERSION_BUFFER_SIZE`. | ||
|
||
.. versionadded:: next | ||
|
||
|
||
.. c:function:: Py_ssize_t PyUCS4_ToFolded(const Py_UCS4 *str, Py_ssize_t str_size, Py_UCS4 *buffer, Py_ssize_t buf_size) | ||
|
||
Foldcase *str* characters, store result in *buffer*, which should be | ||
able to hold as many characters needed for *str* to be foldcased, and | ||
return the number of characters stored. If at some point a buffer overflow | ||
is detected, an :exc:`ValueError` is raised and ``-1`` is returned. | ||
vstinner marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
*str_size*, *buf_size* and the result are number of UCS-4 characters. | ||
|
||
In Unicode 16.0, any character can be foldcased into a buffer of *buf_size* ``3``. | ||
See also :c:macro:`PyUCS4_CASE_CONVERSION_BUFFER_SIZE`. | ||
|
||
.. versionadded:: next | ||
|
||
.. c:macro:: PyUCS4_CASE_CONVERSION_BUFFER_SIZE | ||
vstinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The minimum buffer size needed for any call to :c:func:`PyUCS4_ToLower`, | ||
:c:func:`PyUCS4_ToUpper`, :c:func:`PyUCS4_ToTitle`, or | ||
:c:func:`PyUCS4_ToFolded`. That is, ``3`` for Unicode 16.0. | ||
|
||
.. versionadded:: next | ||
|
||
|
||
These APIs can be used to work with surrogates: | ||
|
||
.. c:function:: int Py_UNICODE_IS_SURROGATE(Py_UCS4 ch) | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -713,6 +713,12 @@ unicodedata | |
* The Unicode database has been updated to Unicode 17.0.0. | ||
|
||
|
||
unicodedata | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this duplicated? |
||
----------- | ||
|
||
* The Unicode database has been updated to Unicode 17.0.0. | ||
|
||
|
||
wave | ||
---- | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 change: 1 addition & 0 deletions
1
Misc/NEWS.d/next/C_API/2025-07-01-14-56-41.gh-issue-76535.9cwObj.rst
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Make :c:func:`PyUCS4_ToLower`, :c:func:`PyUCS4_ToUpper`, :c:func:`PyUCS4_ToTitle` and :c:func:`PyUCS4_ToFolded` public. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.