-
-
Notifications
You must be signed in to change notification settings - Fork 33.1k
gh-138232: Improve performance of dataclasses by caching dataclass field names #138233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nitpicks. Memory-wise, we're already holding all the fields in _FIELDS, so holding their names shouldn't be an issue. Don't forget the NEWS + What's New entry if the change is accepted.
Co-authored-by: Bénédikt Tran <[email protected]>
Lib/dataclasses.py
Outdated
# order, so the order of the tuple is as the fields were defined. | ||
return tuple(f for f in fields.values() if f._field_type is _FIELD) | ||
|
||
def _field_names(class_or_instance): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlining this method gives a bit of performance gain:
%timeit _field_names(s)
%timeit s.__dataclass_field_names__
Results in
41.8 ns ± 1.59 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
17.8 ns ± 0.188 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More generally, I wonder whether many small functions would actually benefit from a C implementation as they seem to target micro-optimizations. I don't know which ones we could optimize and I also don't know if it's worth it but had we ever considered doing this?
Now, I think it's a good trade-off to increase memory per dataclass type but reduce quite drastically the time it takes to do asdict
transformations.
Lib/dataclasses.py
Outdated
Accepts a dataclass or an instance of one. Excludes pseudo-fields. | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion here is to remove the blank line after """
. And I however think that it would be better to actually inline the function but considering we also use this approach for _PARAMS
, it would be more consistent to keep it.
Misc/NEWS.d/next/Library/2025-08-29-17-58-36.gh-issue-138232.-W4iaS.rst
Outdated
Show resolved
Hide resolved
I addressed the review comments. In the last commit I inlined the |
Lib/dataclasses.py
Outdated
# also marks this class as being a dataclass. | ||
setattr(cls, _FIELDS, fields) | ||
# Store field names. Excludes pseudo-fields. | ||
setattr(cls, _FIELD_NAMES, tuple(f.name for f in fields.values() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want, you can also do cls.__dataclass_field_names__ = ...
directly (I would also advise you to consider this as part of the "inline" commit, so you can also force-push this one).
An other alternative is to only inline getattr(x, _FIELDS_NAMES)
instead of having a standalone function for that. It could also improve readability a bit and justify the needs of the global _FIELD_NAMES
.
IOW, choose among the following:
- Hardcode
__dataclass_field_names__
everywhere, without having using getattr/setattr and_FIELD_NAMES
. - Use
_FIELD_NAMES
+getattr/setattr(..., _FIELD_NAMES)
directly. - Use
_FIELD_NAMES
+ a module-wide function defined to be the partialization_get_names(x) ~ getattr(x, _FIELD_NAMES)
.
My preference is (1) or (2) but (3) is an overkill IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, going for option (1)
Benchmark results with PGO on Linux:
(script in issue)