Skip to content

Commit 5bd4bf0

Browse files
authored
closes gh-138706: update Unicode to 17.0.0 (#138719)
1 parent e0f54a6 commit 5bd4bf0

File tree

11 files changed

+20937
-20874
lines changed

11 files changed

+20937
-20874
lines changed

Doc/library/stdtypes.rst

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1843,9 +1843,9 @@ expression support in the :mod:`re` module).
18431843
lowercase, :meth:`lower` would do nothing to ``'ß'``; :meth:`casefold`
18441844
converts it to ``"ss"``.
18451845

1846-
The casefolding algorithm is
1847-
`described in section 3.13 'Default Case Folding' of the Unicode Standard
1848-
<https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G33992>`__.
1846+
The casefolding algorithm is `described in section 3.13.3 'Default Case
1847+
Folding' of the Unicode Standard
1848+
<https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G53253>`__.
18491849

18501850
.. versionadded:: 3.3
18511851

@@ -2056,7 +2056,7 @@ expression support in the :mod:`re` module).
20562056
property being one of "Lm", "Lt", "Lu", "Ll", or "Lo". Note that this is different
20572057
from the `Alphabetic property defined in the section 4.10 'Letters, Alphabetic, and
20582058
Ideographic' of the Unicode Standard
2059-
<https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-4/#G91002>`_.
2059+
<https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-4/#G91002>`__.
20602060

20612061

20622062
.. method:: str.isascii()
@@ -2196,9 +2196,9 @@ expression support in the :mod:`re` module).
21962196
Return a copy of the string with all the cased characters [4]_ converted to
21972197
lowercase.
21982198

2199-
The lowercasing algorithm used is
2200-
`described in section 3.13 'Default Case Folding' of the Unicode Standard
2201-
<https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G33992>`__.
2199+
The lowercasing algorithm used is `described in section 3.13.2 'Default Case
2200+
Conversion' of the Unicode Standard
2201+
<https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G34078>`__.
22022202

22032203

22042204
.. method:: str.lstrip(chars=None, /)
@@ -2561,9 +2561,9 @@ expression support in the :mod:`re` module).
25612561
character(s) is not "Lu" (Letter, uppercase), but e.g. "Lt" (Letter,
25622562
titlecase).
25632563

2564-
The uppercasing algorithm used is
2565-
`described in section 3.13 'Default Case Folding' of the Unicode Standard
2566-
<https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G33992>`__.
2564+
The uppercasing algorithm used is `described in section 3.13.2 'Default Case
2565+
Conversion' of the Unicode Standard
2566+
<https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G34078>`__.
25672567

25682568

25692569
.. method:: str.zfill(width, /)

Doc/library/unicodedata.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@
1717

1818
This module provides access to the Unicode Character Database (UCD) which
1919
defines character properties for all Unicode characters. The data contained in
20-
this database is compiled from the `UCD version 16.0.0
21-
<https://www.unicode.org/Public/16.0.0/ucd>`_.
20+
this database is compiled from the `UCD version 17.0.0
21+
<https://www.unicode.org/Public/17.0.0/ucd>`_.
2222

2323
The module uses the same names and symbols as defined by Unicode
2424
Standard Annex #44, `"Unicode Character Database"
@@ -211,6 +211,6 @@ In addition, the module exposes the following constant:
211211

212212
.. rubric:: Footnotes
213213

214-
.. [#] https://www.unicode.org/Public/16.0.0/ucd/NameAliases.txt
214+
.. [#] https://www.unicode.org/Public/17.0.0/ucd/NameAliases.txt
215215
216-
.. [#] https://www.unicode.org/Public/16.0.0/ucd/NamedSequences.txt
216+
.. [#] https://www.unicode.org/Public/17.0.0/ucd/NamedSequences.txt

Doc/reference/lexical_analysis.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -384,8 +384,8 @@ Character Database.
384384

385385

386386
.. _UAX-31: https://www.unicode.org/reports/tr31/
387-
.. _PropList.txt: https://www.unicode.org/Public/16.0.0/ucd/PropList.txt
388-
.. _DerivedCoreProperties.txt: https://www.unicode.org/Public/16.0.0/ucd/DerivedCoreProperties.txt
387+
.. _PropList.txt: https://www.unicode.org/Public/17.0.0/ucd/PropList.txt
388+
.. _DerivedCoreProperties.txt: https://www.unicode.org/Public/17.0.0/ucd/DerivedCoreProperties.txt
389389
.. _normalization form: https://www.unicode.org/reports/tr15/#Norm_Forms
390390

391391

@@ -793,7 +793,7 @@ with the given *name*::
793793
This sequence cannot appear in :ref:`bytes literals <bytes-literal>`.
794794

795795
.. versionchanged:: 3.3
796-
Support for `name aliases <https://www.unicode.org/Public/16.0.0/ucd/NameAliases.txt>`__
796+
Support for `name aliases <https://www.unicode.org/Public/17.0.0/ucd/NameAliases.txt>`__
797797
has been added.
798798

799799
.. _string-escape-long-hex:

Doc/whatsnew/3.15.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -648,6 +648,12 @@ typing
648648
(Contributed by Nikita Sobolev in :gh:`137191`.)
649649

650650

651+
unicodedata
652+
-----------
653+
654+
* The Unicode database has been updated to Unicode 17.0.0.
655+
656+
651657
wave
652658
----
653659

Lib/test/test_unicodedata.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
class UnicodeMethodsTest(unittest.TestCase):
2525

2626
# update this, if the database changes
27-
expectedchecksum = '9e43ee3929471739680c0e705482b4ae1c4122e4'
27+
expectedchecksum = '8b2615a9fc627676cbc0b6fac0191177df97ef5f'
2828

2929
@requires_resource('cpu')
3030
def test_method_checksum(self):
@@ -77,7 +77,7 @@ class UnicodeFunctionsTest(UnicodeDatabaseTest):
7777

7878
# Update this if the database changes. Make sure to do a full rebuild
7979
# (e.g. 'make distclean && make') to get the correct checksum.
80-
expectedchecksum = '23ab09ed4abdf93db23b97359108ed630dd8311d'
80+
expectedchecksum = '65670ae03a324c5f9e826a4de3e25bae4d73c9b7'
8181

8282
@requires_resource('cpu')
8383
def test_function_checksum(self):
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Update :mod:`unicodedata` database to Unicode 17.0.0.

Modules/unicodedata.c

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1020,13 +1020,14 @@ is_unified_ideograph(Py_UCS4 code)
10201020
(0x3400 <= code && code <= 0x4DBF) || /* CJK Ideograph Extension A */
10211021
(0x4E00 <= code && code <= 0x9FFF) || /* CJK Ideograph */
10221022
(0x20000 <= code && code <= 0x2A6DF) || /* CJK Ideograph Extension B */
1023-
(0x2A700 <= code && code <= 0x2B739) || /* CJK Ideograph Extension C */
1023+
(0x2A700 <= code && code <= 0x2B73F) || /* CJK Ideograph Extension C */
10241024
(0x2B740 <= code && code <= 0x2B81D) || /* CJK Ideograph Extension D */
1025-
(0x2B820 <= code && code <= 0x2CEA1) || /* CJK Ideograph Extension E */
1025+
(0x2B820 <= code && code <= 0x2CEAD) || /* CJK Ideograph Extension E */
10261026
(0x2CEB0 <= code && code <= 0x2EBE0) || /* CJK Ideograph Extension F */
10271027
(0x2EBF0 <= code && code <= 0x2EE5D) || /* CJK Ideograph Extension I */
10281028
(0x30000 <= code && code <= 0x3134A) || /* CJK Ideograph Extension G */
1029-
(0x31350 <= code && code <= 0x323AF); /* CJK Ideograph Extension H */
1029+
(0x31350 <= code && code <= 0x323AF) || /* CJK Ideograph Extension H */
1030+
(0x323B0 <= code && code <= 0x33479); /* CJK Ideograph Extension J */
10301031
}
10311032

10321033
/* macros used to determine if the given code point is in the PUA range that

0 commit comments

Comments
 (0)