-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
gh-133956 fix bug where ClassVar
string annotation in @dataclass
caused incorrect __init__
generation
#134073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
if not cls_module: | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is interesting that we're paranoid about the dataclass not having its module imported on only this branch and not the branch above. Not suggesting a change, but a comment could be useful if you know why we only care in the else
branch and not the if not module_name
branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the checks that were already present in the function. In the else
branch, there was a check for if module
:
module = sys.modules.get(cls.__module__)
if module and module.__dict__.get(module_name) is a_module:
ns = sys.modules.get(a_type.__module__).__dict__
I’d actually consider removing that check — it does seem unnecessary. At least, I can’t think of a case where cls.__module__
would be missing from sys.modules
. And the tests are still passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I saw the original code. My suggestion would be to hoist module = sys.modules.get(cls.__module__)
out of the if..else
. If you want to preserve the rather paranoid sanity check alongside the hoisted assignment, you could. Something like:
module = sys.modules.get(cls.__module__)
if module: # not sure that we need to be this paranoid
ns = module.__dict__
if module_name:
...
if ( | ||
isinstance(a_type_module, types.ModuleType) | ||
# Handle cases when a_type is not defined in | ||
# the referenced module, e.g. 'dataclasses.ClassVar[int]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InitVar
lives in dataclasses
, ClassVar
lives in typing
. Using an example of dataclasses.ClassVar
is likely to cause confusion.
return False | ||
|
||
ns = None | ||
module_name = match.group(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an edge case that is not handled by this: foo.typing_filtered.ClassVar
. The regex only accounts for one module level. While it is probably uncommon to import foo.typing_filtered
without aliasing it, it is possible.
I don't want to change the world for a simple bug fix, but it seems like partitioning on .
is more robust (and probably more performant) than using the current regex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree that splitting on a dot would be faster, there are existing tests that cover cases like dataclasses.InitVar.[int]
and dataclasses.InitVar+
. If we want to support multiple module levels, we should extend the current regex pattern rather than replace it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth creating a new issue to track more levels of module nesting. And I think it's a mistake to test for dataclasses.InitVar.[int]
and other invalid strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bug was actually caught by __.typx.ClassVar
in the real world. I typically stuff all of my common imports, including typing_extensions
as typx
, into an internals subpackage (__
) so that I can do from . import __
. Reduces module namespace pollution and lets me avoid setting __all__
to define module interfaces. But, I understand that my practice may not be common.
That said, any fix to multi-level traversal here or another PR is going to affect the code that is actually being fixed. Imo, it would make more sense to fix both together holistically (to save developer effort), if there is any interest in actually fixing the multi-level traversal.
if ( | ||
isinstance(a_type_module, types.ModuleType) | ||
# Handle cases when a_type is not defined in | ||
# the referenced module, e.g. 'dataclasses.ClassVar[int]' | ||
and a_type_module.__dict__.get(type_name) is a_type | ||
): | ||
ns = sys.modules.get(a_type.__module__).__dict__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be replaced with:
return (
isinstance(a_type_module, types.ModuleType)
and is_type_predicate(a_type_module.__dict__.get(type_name), a_module))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we can indeed replace it. But if we go with this version, the a_type
parameter will no longer be used in the function body.
Also, I was thinking — in the original version, this line:
sys.modules.get(a_type.__module__)
could probably be replaced with a_module
? Or am I missing something? It seems like a_type
might not have been necessary from the start, and we should probably remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think the code can be simplified, please do so, then I'll review that version.
@@ -0,0 +1 @@ | |||
Fix bug where ``ClassVar`` string annotation in :func:`@dataclass <dataclasses.dataclass>` caused incorrect __init__ generation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just starting to look at this PR, and will have more to say later. For now: I'd prefer this to say something more specific, like <some specific description> of ClassVar not recognized as typing.ClassVar
. I'd like this because I'm sure we won't be fixing all instances of using ClassVar as string annotations, and we should mention the exact one being fixed here. Also, I don't want to mention __init__
, because that's not the only method affected by this bug.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Thanks to @JelleZijlstra for pointing out a possible fix! I added an additional check, but the initial draft was nearly complete.
dataclasses
: Synthetic__init__
method on dataclass has brokenClassVar
annotation under some conditions. #133956