Skip to content

codecs module doesn't recognize new C++ 23 universal-character-name \u{xxx}. #130475

@mrolle45

Description

@mrolle45

Bug report

Bug description:

The C++ 23 Standard has a new syntax for universal character names, which codecs.decode does not recognize. I ran this on Python 3.13, and the same occurs with earlier Python versions.

Python 3.13.1 (tags/v3.13.1:0671451, Dec  3 2024, 19:06:28) [MSC v.1942 64 bit (AMD64)] on win32
>>> import codecs
>>> codecs.decode('\u41',encoding='unicode-escape')
'A'
>>> codecs.decode('\u{41}',encoding='unicode-escape')
  File "<python-input-3>", line 1
    codecs.decode('\u{41}',encoding='unicode-escape')
                  ^^^^^^^^^^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape

The result should be 'A'.

For reference, this is quoted from the C++ 23 Standard, Appendix A.3:

universal-character-name:
    ...
    \u{ simple-hexadecimal-digit-sequence }

simple-hexadecimal-digit-sequence:
    hexadecimal-digit
    simple-hexadecimal-digit-sequence hexadecimal-digit

Please update codecs in Python 3.13, and all earlier Python versions that are still publishing bug fixes.

CPython versions tested on:

3.13

Operating systems tested on:

Windows

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions