gh-139516: Fix lambda colon start format spec in f-string in tokenizer #139657

tom-pytel · 2025-10-06T13:13:48Z

A = followed by a : in an f-string expression could cause the tokenizer to erroneously think it was starting a format spec, leading to incorrect internal state and possible decode errors if this results in split unicode characters on copy. This PR fixes this by disallowing = to set in_debug state unless it is encountered at the top level of an f-string expression.

This problem exists back to py 3.13 and this PR can probably be backported easily enough.

Issue: Parser gives UnicodeDecodeError on what should be good code #139516

tom-pytel · 2025-10-06T13:16:56Z

Ping @pablogsal. I added the test to test_tokenize instead of test_fstring as it seems to fit there better.

pablogsal · 2025-10-06T16:09:45Z

Please add a rest for the f-string test file as well as this will be a semantic test that needs to hold true even if we change the tokenizer of some other implementation doesn't have the same tokenizer

pablogsal · 2025-10-06T18:12:08Z

Lib/test/test_fstring.py

+        # gh-139516
+        # The '\n' is explicit to ensure no trailing whitespace which would invalidate the test.
+        # Must use tokenize instead of compile so that source is parsed by line which exposes the bug.
+        list(tokenize.tokenize(BytesIO('''f"{f(a=lambda: 'à'\n)}"'''.encode()).readline))


I am confused. Isn't it possible to trigger this in an exec or eval call? Or perhaps a file with an encoding?

See below VVV

tom-pytel · 2025-10-06T18:14:12Z

Please add a rest for the f-string test file as well as this will be a semantic test that needs to hold true even if we change the tokenizer of some other implementation doesn't have the same tokenizer

Done. But I had to use tokenize() because of an interesting quirk. The bug shows up with tokenize() or executing a python script directly with the bad source or typing it into the repl. It does not show up with compile() or ast.parse() or eval() or exec() or import .... The difference seems to be if the source is read line by line or not, in which case if the full string is available on parse then the tail end of the string past the NL is present to offset from on copy and the bug doesn't present.

Let me know if this test is good enough or if you want something else.

pablogsal · 2025-10-06T18:17:13Z

Let me know if this test is good enough or if you want something else.

yes, going via the tokenizer makes no sense here. The pourpose of what I asked is that alternative implementations will still run these tests files to check if they are compliant and we need to provide a way to run a file or exec some code and say "this is what we expect". You are triggering the bug via a specific aspect of CPython but I would prefer if we could trigger it end-to-end via a file. There are more tests executing python over files, check in test_syntax or test_grammar or test_compile.

tom-pytel · 2025-10-06T18:43:47Z

Running error as script.

pythongh-139516: fix lambda colon start format spec in f-string

7cdd725

tom-pytel requested review from pablogsal and lysnikolaou as code owners October 6, 2025 13:13

bedevere-app bot mentioned this pull request Oct 6, 2025

Parser gives UnicodeDecodeError on what should be good code #139516

Open

bedevere-app bot added the awaiting review label Oct 6, 2025

📜🤖 Added by blurb_it.

f6fbb7e

add test to test_fstring

a13aaea

pablogsal reviewed Oct 6, 2025

View reviewed changes

test_fstring test using script

e6d23e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-139516: Fix lambda colon start format spec in f-string in tokenizer #139657

gh-139516: Fix lambda colon start format spec in f-string in tokenizer #139657

tom-pytel commented Oct 6, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

tom-pytel commented Oct 6, 2025

Uh oh!

pablogsal commented Oct 6, 2025 •

edited

Loading

Uh oh!

pablogsal Oct 6, 2025

Uh oh!

tom-pytel Oct 6, 2025

Uh oh!

tom-pytel commented Oct 6, 2025

Uh oh!

pablogsal commented Oct 6, 2025 •

edited

Loading

Uh oh!

tom-pytel commented Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!

gh-139516: Fix lambda colon start format spec in f-string in tokenizer #139657

Are you sure you want to change the base?

gh-139516: Fix lambda colon start format spec in f-string in tokenizer #139657

Conversation

tom-pytel commented Oct 6, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tom-pytel commented Oct 6, 2025

Uh oh!

pablogsal commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pablogsal Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

tom-pytel Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

tom-pytel commented Oct 6, 2025

Uh oh!

pablogsal commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tom-pytel commented Oct 6, 2025

Uh oh!

Uh oh!

tom-pytel commented Oct 6, 2025 •

edited by bedevere-app bot

Loading

pablogsal commented Oct 6, 2025 •

edited

Loading

pablogsal commented Oct 6, 2025 •

edited

Loading