-
-
Notifications
You must be signed in to change notification settings - Fork 33k
gh-139516: Fix lambda colon start format spec in f-string in tokenizer #139657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Ping @pablogsal. I added the test to |
Please add a rest for the f-string test file as well as this will be a semantic test that needs to hold true even if we change the tokenizer of some other implementation doesn't have the same tokenizer |
Lib/test/test_fstring.py
Outdated
# gh-139516 | ||
# The '\n' is explicit to ensure no trailing whitespace which would invalidate the test. | ||
# Must use tokenize instead of compile so that source is parsed by line which exposes the bug. | ||
list(tokenize.tokenize(BytesIO('''f"{f(a=lambda: 'à'\n)}"'''.encode()).readline)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused. Isn't it possible to trigger this in an exec
or eval
call? Or perhaps a file with an encoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below VVV
Done. But I had to use Let me know if this test is good enough or if you want something else. |
yes, going via the tokenizer makes no sense here. The pourpose of what I asked is that alternative implementations will still run these tests files to check if they are compliant and we need to provide a way to run a file or exec some code and say "this is what we expect". You are triggering the bug via a specific aspect of CPython but I would prefer if we could trigger it end-to-end via a file. There are more tests executing python over files, check in |
Running error as script. |
A
=
followed by a:
in an f-string expression could cause the tokenizer to erroneously think it was starting a format spec, leading to incorrect internal state and possible decode errors if this results in split unicode characters on copy. This PR fixes this by disallowing=
to setin_debug
state unless it is encountered at the top level of an f-string expression.This problem exists back to py 3.13 and this PR can probably be backported easily enough.