-
-
Notifications
You must be signed in to change notification settings - Fork 33k
Open
Labels
interpreter-core(Objects, Python, Grammar, and Parser dirs)(Objects, Python, Grammar, and Parser dirs)topic-parsertopic-unicodetype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Description
Bug report
Bug description:
In 3.12 and below parses fine. Note the special unicode character in the inner string, its '\u3001'
(utf8 b'\xe3\x80\x81'
), it goes from good to fail by removing a space or parenthesizing the whole expression, so positional?
>>> from io import BytesIO
>>> from tokenize import tokenize
>>>
>>> src_good = '''f"{f(a=lambda: '、' \n)}"'''
>>> src_bad1 = '''f"{f(a=lambda: '、'\n)}"'''
>>> src_bad2 = '''(f"{f(a=lambda: '、' \n)}")'''
>>>
>>> for token in tokenize(BytesIO(src_good.encode()).readline): print(token)
...
TokenInfo(type=65 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
TokenInfo(type=59 (FSTRING_START), string='f"', start=(1, 0), end=(1, 2), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=55 (OP), string='{', start=(1, 2), end=(1, 3), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=1 (NAME), string='f', start=(1, 3), end=(1, 4), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=55 (OP), string='(', start=(1, 4), end=(1, 5), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=1 (NAME), string='a', start=(1, 5), end=(1, 6), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=55 (OP), string='=', start=(1, 6), end=(1, 7), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=1 (NAME), string='lambda', start=(1, 7), end=(1, 13), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=55 (OP), string=':', start=(1, 13), end=(1, 14), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=3 (STRING), string="'、'", start=(1, 15), end=(1, 18), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=63 (NL), string='\n', start=(1, 19), end=(1, 20), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=55 (OP), string=')', start=(2, 0), end=(2, 1), line=')}"')
TokenInfo(type=55 (OP), string='}', start=(2, 1), end=(2, 2), line=')}"')
TokenInfo(type=61 (FSTRING_END), string='"', start=(2, 2), end=(2, 3), line=')}"')
TokenInfo(type=4 (NEWLINE), string='', start=(2, 3), end=(2, 4), line=')}"')
TokenInfo(type=0 (ENDMARKER), string='', start=(3, 0), end=(3, 0), line='')
>>> for token in tokenize(BytesIO(src_bad1.encode()).readline): print(token)
...
TokenInfo(type=65 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
TokenInfo(type=59 (FSTRING_START), string='f"', start=(1, 0), end=(1, 2), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=55 (OP), string='{', start=(1, 2), end=(1, 3), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=1 (NAME), string='f', start=(1, 3), end=(1, 4), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=55 (OP), string='(', start=(1, 4), end=(1, 5), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=1 (NAME), string='a', start=(1, 5), end=(1, 6), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=55 (OP), string='=', start=(1, 6), end=(1, 7), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=1 (NAME), string='lambda', start=(1, 7), end=(1, 13), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=55 (OP), string=':', start=(1, 13), end=(1, 14), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=3 (STRING), string="'、'", start=(1, 15), end=(1, 18), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=63 (NL), string='\n', start=(1, 18), end=(1, 19), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=55 (OP), string=')', start=(2, 0), end=(2, 1), line=')}"')
Traceback (most recent call last):
File "<python-input-8>", line 1, in <module>
for token in tokenize(BytesIO(src_bad1.encode()).readline): print(token)
~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/tokenize.py", line 492, in tokenize
yield from _generate_tokens_from_c_tokenizer(rl_gen.__next__, encoding, extra_tokens=True)
File "/usr/local/lib/python3.13/tokenize.py", line 582, in _generate_tokens_from_c_tokenizer
for info in it:
^^
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 13-14: unexpected end of data
The other bad source and other permutations give same error. You get immediate error typing the bad src interactively.
CPython versions tested on:
3.13, 3.14, 3.15
Operating systems tested on:
Linux
Linked PRs
Metadata
Metadata
Assignees
Labels
interpreter-core(Objects, Python, Grammar, and Parser dirs)(Objects, Python, Grammar, and Parser dirs)topic-parsertopic-unicodetype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error