-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Fix binary operations on attrs for Series and DataFrame #59636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fbourgey
commented
Aug 28, 2024
- closes BUG: binary operations don't propogate attrs depending on order with Series and/or DataFrame/Series #51607
- Test
- Test
WillAyd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small change to prefer fixtures to writing out our own binop implementations, but generally lgtm. I don't think current CI failures are related.
@mroeschke any thoughts here?
pandas/tests/frame/test_api.py
Outdated
| df_2 = DataFrame({"A": [-3, 9]}) | ||
| attrs = {"info": "DataFrame"} | ||
| df_1.attrs = attrs | ||
| assert (df_1 + df_2).attrs == attrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than doing this you can just use the all_binary_operators fixture from conftest.py (I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the change.
mroeschke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think attrs propagation logic should should only be handled by __finalize__, so these binary operations should dispatch to that method
|
@mroeschke should everything be rewritten using |
|
Yes, or |
|
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
|
@mroeschke, @WillAyd, I tried using |
|
I think it looks good but will defer to @mroeschke |
pandas/core/frame.py
Outdated
| def _cmp_method(self, other, op): | ||
| axis: Literal[1] = 1 # only relevant for Series other case | ||
|
|
||
| if not getattr(self, "attrs", None) and getattr(other, "attrs", None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should need these anymore here since this should be handled in _construct_result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that sometimes
self, other = self._align_for_op(other, axis, flex=False, level=None)resets other.attrs to {}.
This is why I kept it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it because other is getting overridden here? Otherwise, _align_for_op should also preserve the attrs of other.
| axis=axis, | ||
| level=level, | ||
| ) | ||
| right = left._maybe_align_series_as_frame(right, axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this resets the attrs of right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider that a bug. attrs should be preserved in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I fix it in this PR or raise a different issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can fix it in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested something below
pandas/core/frame.py
Outdated
| DataFrame | ||
| """ | ||
| if not getattr(self, "attrs", None) and getattr(other, "attrs", None): | ||
| self.__finalize__(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do out = out.__finalize(other) instead?
pandas/core/frame.py
Outdated
| return self._construct_result(new_data, other=other) | ||
|
|
||
| def _construct_result(self, result) -> DataFrame: | ||
| def _construct_result(self, result, other=None) -> DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def _construct_result(self, result, other=None) -> DataFrame: | |
| def _construct_result(self, result, other) -> DataFrame: |
Might as well make this required
pandas/core/frame.py
Outdated
| if not getattr(self, "attrs", None) and getattr(other, "attrs", None): | ||
| out = out.__finalize__(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if not getattr(self, "attrs", None) and getattr(other, "attrs", None): | |
| out = out.__finalize__(other) | |
| out = out.__finalize__(other) |
Appears __finalize__ will correctly skip if other has a populated attrs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this breaks the following test:
FAILED pandas/tests/generic/test_duplicate_labels.py::TestPreserves::test_binops[other1-True-add] - AssertionError
FAILED pandas/tests/generic/test_duplicate_labels.py::TestPreserves::test_binops[other1-True-sub] - AssertionErrorThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line __finalize__ needs a fix:
self.flags.allows_duplicate_labels = other.flags.allows_duplicate_labels
Prioritizing False if self.flags.allows_duplicate_labels or other.flags.allows_duplicate_labels is False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about doing in __finalize__
if isinstance(other, NDFrame):
if other.attrs:
# We want attrs propagation to have minimal performance
# impact if attrs are not used; i.e. attrs is an empty dict.
# One could make the deepcopy unconditionally, but a deepcopy
# of an empty dict is 50x more expensive than the empty check.
self.attrs = deepcopy(other.attrs)
self.flags.allows_duplicate_labels = (
self.flags.allows_duplicate_labels
and other.flags.allows_duplicate_labels
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup that's the correct location to fix this
pandas/core/indexes/base.py
Outdated
|
|
||
| @final | ||
| def _construct_result(self, result, name): | ||
| def _construct_result(self, result, name, other=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def _construct_result(self, result, name, other=None): | |
| def _construct_result(self, result, name, other): |
pandas/core/series.py
Outdated
| self, | ||
| result: ArrayLike | tuple[ArrayLike, ArrayLike], | ||
| name: Hashable, | ||
| other: AnyArrayLike | DataFrame | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| other: AnyArrayLike | DataFrame | None = None, | |
| other: AnyArrayLike | DataFrame, |
pandas/core/series.py
Outdated
| ---------- | ||
| result : ndarray or ExtensionArray | ||
| name : Label | ||
| other : Series, DataFrame or array-like, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| other : Series, DataFrame or array-like, default None | |
| other : Series, DataFrame or array-like |
pandas/core/series.py
Outdated
| if getattr(other, "attrs", None): | ||
| out.__finalize__(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if getattr(other, "attrs", None): | |
| out.__finalize__(other) | |
| out = out.__finalize__(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this breaks:
FAILED pandas/tests/generic/test_duplicate_labels.py::TestPreserves::test_binops[other1-False-add] - AssertionError
FAILED pandas/tests/generic/test_duplicate_labels.py::TestPreserves::test_binops[other1-False-sub] - AssertionErrorsomething to do with flags.allows_duplicate_labels
pandas/core/base.py
Outdated
| return self._construct_result(result, name=res_name, other=other) | ||
|
|
||
| def _construct_result(self, result, name): | ||
| def _construct_result(self, result, name, other=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def _construct_result(self, result, name, other=None): | |
| def _construct_result(self, result, name, other): |
pandas/core/frame.py
Outdated
| left : DataFrame | ||
| right : Any | ||
| """ | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "before operating." | ||
| ) | ||
|
|
||
| left, right = left.align( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| left, right = left.align( | |
| left, right = left.align( |
|
Thanks for sticking with this @fbourgey! |
|
Thanks for the help @mroeschke! |