Parser gives UnicodeDecodeError on what should be good code

# Bug report

### Bug description:

In 3.12 and below parses fine. Note the special unicode character in the inner string, its `'\u3001'` (utf8 `b'\xe3\x80\x81'`), it goes from good to fail by removing a space or parenthesizing the whole expression, so positional?

```py
>>> from io import BytesIO
>>> from tokenize import tokenize
>>> 
>>> src_good = '''f"{f(a=lambda: '、' \n)}"'''
>>> src_bad1 = '''f"{f(a=lambda: '、'\n)}"'''
>>> src_bad2 = '''(f"{f(a=lambda: '、' \n)}")'''
>>>
>>> for token in tokenize(BytesIO(src_good.encode()).readline): print(token)
... 
TokenInfo(type=65 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
TokenInfo(type=59 (FSTRING_START), string='f"', start=(1, 0), end=(1, 2), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=55 (OP), string='{', start=(1, 2), end=(1, 3), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=1 (NAME), string='f', start=(1, 3), end=(1, 4), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=55 (OP), string='(', start=(1, 4), end=(1, 5), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=1 (NAME), string='a', start=(1, 5), end=(1, 6), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=55 (OP), string='=', start=(1, 6), end=(1, 7), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=1 (NAME), string='lambda', start=(1, 7), end=(1, 13), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=55 (OP), string=':', start=(1, 13), end=(1, 14), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=3 (STRING), string="'、'", start=(1, 15), end=(1, 18), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=63 (NL), string='\n', start=(1, 19), end=(1, 20), line='f"{f(a=lambda: \'、\' \n')
TokenInfo(type=55 (OP), string=')', start=(2, 0), end=(2, 1), line=')}"')
TokenInfo(type=55 (OP), string='}', start=(2, 1), end=(2, 2), line=')}"')
TokenInfo(type=61 (FSTRING_END), string='"', start=(2, 2), end=(2, 3), line=')}"')
TokenInfo(type=4 (NEWLINE), string='', start=(2, 3), end=(2, 4), line=')}"')
TokenInfo(type=0 (ENDMARKER), string='', start=(3, 0), end=(3, 0), line='')
```

```py
>>> for token in tokenize(BytesIO(src_bad1.encode()).readline): print(token)
... 
TokenInfo(type=65 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
TokenInfo(type=59 (FSTRING_START), string='f"', start=(1, 0), end=(1, 2), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=55 (OP), string='{', start=(1, 2), end=(1, 3), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=1 (NAME), string='f', start=(1, 3), end=(1, 4), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=55 (OP), string='(', start=(1, 4), end=(1, 5), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=1 (NAME), string='a', start=(1, 5), end=(1, 6), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=55 (OP), string='=', start=(1, 6), end=(1, 7), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=1 (NAME), string='lambda', start=(1, 7), end=(1, 13), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=55 (OP), string=':', start=(1, 13), end=(1, 14), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=3 (STRING), string="'、'", start=(1, 15), end=(1, 18), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=63 (NL), string='\n', start=(1, 18), end=(1, 19), line='f"{f(a=lambda: \'、\'\n')
TokenInfo(type=55 (OP), string=')', start=(2, 0), end=(2, 1), line=')}"')
Traceback (most recent call last):
  File "<python-input-8>", line 1, in <module>
    for token in tokenize(BytesIO(src_bad1.encode()).readline): print(token)
                 ~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/tokenize.py", line 492, in tokenize
    yield from _generate_tokens_from_c_tokenizer(rl_gen.__next__, encoding, extra_tokens=True)
  File "/usr/local/lib/python3.13/tokenize.py", line 582, in _generate_tokens_from_c_tokenizer
    for info in it:
                ^^
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 13-14: unexpected end of data
```
The other bad source and other permutations give same error. You get immediate error typing the bad src interactively.


### CPython versions tested on:

3.13, 3.14, 3.15

### Operating systems tested on:

Linux


### Linked PRs
* gh-139657
* gh-139701
* gh-139726

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Parser gives UnicodeDecodeError on what should be good code #139516

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Parser gives UnicodeDecodeError on what should be good code #139516

Description

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions