-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Fix binary operations on attrs for Series and DataFrame #59636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 34 commits
38efbaa
15db834
50118d7
953ef17
bc4da65
2607f9c
b154203
87abede
8559c1f
742e3b1
5178113
5b71171
8da62bd
c2c87dd
4da1786
6436926
50396ad
76b0831
ef3c0b5
e92e431
311b662
95cb745
c162bba
bd46491
bf898c7
e9c3e91
b72a049
3abc22d
1e6197c
3a80cf9
83e908f
e342a6e
c251196
96ddd0d
155a9d1
45e2ad0
cf67fcf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -7879,7 +7879,7 @@ def _cmp_method(self, other, op): | |||||||
|
||||||||
# See GH#4537 for discussion of scalar op behavior | ||||||||
new_data = self._dispatch_frame_op(other, op, axis=axis) | ||||||||
return self._construct_result(new_data) | ||||||||
return self._construct_result(new_data, other=other) | ||||||||
|
||||||||
def _arith_method(self, other, op): | ||||||||
if self._should_reindex_frame_op(other, op, 1, None, None): | ||||||||
|
@@ -7892,7 +7892,7 @@ def _arith_method(self, other, op): | |||||||
|
||||||||
with np.errstate(all="ignore"): | ||||||||
new_data = self._dispatch_frame_op(other, op, axis=axis) | ||||||||
return self._construct_result(new_data) | ||||||||
return self._construct_result(new_data, other=other) | ||||||||
|
||||||||
_logical_method = _arith_method | ||||||||
|
||||||||
|
@@ -8088,8 +8088,7 @@ def _align_for_op( | |||||||
|
||||||||
Parameters | ||||||||
---------- | ||||||||
left : DataFrame | ||||||||
right : Any | ||||||||
other : Any | ||||||||
axis : int | ||||||||
flex : bool or None, default False | ||||||||
Whether this is a flex op, in which case we reindex. | ||||||||
|
@@ -8101,6 +8100,7 @@ def _align_for_op( | |||||||
left : DataFrame | ||||||||
right : Any | ||||||||
""" | ||||||||
|
||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left, right = left.align( | |
left, right = left.align( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this resets the attrs
of right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider that a bug. attrs
should be preserved in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I fix it in this PR or raise a different issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can fix it in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested something below
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _construct_result(self, result, other=None) -> DataFrame: | |
def _construct_result(self, result, other) -> DataFrame: |
Might as well make this required
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not getattr(self, "attrs", None) and getattr(other, "attrs", None): | |
out = out.__finalize__(other) | |
out = out.__finalize__(other) |
Appears __finalize__
will correctly skip if other
has a populated attrs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this breaks the following test:
FAILED pandas/tests/generic/test_duplicate_labels.py::TestPreserves::test_binops[other1-True-add] - AssertionError
FAILED pandas/tests/generic/test_duplicate_labels.py::TestPreserves::test_binops[other1-True-sub] - AssertionError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line __finalize__
needs a fix:
self.flags.allows_duplicate_labels = other.flags.allows_duplicate_labels
Prioritizing False
if self.flags.allows_duplicate_labels
or other.flags.allows_duplicate_labels
is False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about doing in __finalize__
if isinstance(other, NDFrame):
if other.attrs:
# We want attrs propagation to have minimal performance
# impact if attrs are not used; i.e. attrs is an empty dict.
# One could make the deepcopy unconditionally, but a deepcopy
# of an empty dict is 50x more expensive than the empty check.
self.attrs = deepcopy(other.attrs)
self.flags.allows_duplicate_labels = (
self.flags.allows_duplicate_labels
and other.flags.allows_duplicate_labels
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup that's the correct location to fix this
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -7148,10 +7148,10 @@ def _logical_method(self, other, op): | |||||
rvalues = extract_array(other, extract_numpy=True, extract_range=True) | ||||||
|
||||||
res_values = ops.logical_op(lvalues, rvalues, op) | ||||||
return self._construct_result(res_values, name=res_name) | ||||||
return self._construct_result(res_values, name=res_name, other=other) | ||||||
|
||||||
@final | ||||||
def _construct_result(self, result, name): | ||||||
def _construct_result(self, result, name, other=None): | ||||||
|
def _construct_result(self, result, name, other=None): | |
def _construct_result(self, result, name, other): |
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -5858,7 +5858,7 @@ def _cmp_method(self, other, op): | |||||||
|
||||||||
res_values = ops.comparison_op(lvalues, rvalues, op) | ||||||||
|
||||||||
return self._construct_result(res_values, name=res_name) | ||||||||
return self._construct_result(res_values, name=res_name, other=other) | ||||||||
|
||||||||
def _logical_method(self, other, op): | ||||||||
res_name = ops.get_op_result_name(self, other) | ||||||||
|
@@ -5868,7 +5868,7 @@ def _logical_method(self, other, op): | |||||||
rvalues = extract_array(other, extract_numpy=True, extract_range=True) | ||||||||
|
||||||||
res_values = ops.logical_op(lvalues, rvalues, op) | ||||||||
return self._construct_result(res_values, name=res_name) | ||||||||
return self._construct_result(res_values, name=res_name, other=other) | ||||||||
|
||||||||
def _arith_method(self, other, op): | ||||||||
self, other = self._align_for_op(other) | ||||||||
|
@@ -5930,11 +5930,15 @@ def _binop(self, other: Series, func, level=None, fill_value=None) -> Series: | |||||||
result = func(this_vals, other_vals) | ||||||||
|
||||||||
name = ops.get_op_result_name(self, other) | ||||||||
out = this._construct_result(result, name) | ||||||||
|
||||||||
out = this._construct_result(result, name, other) | ||||||||
return cast(Series, out) | ||||||||
|
||||||||
def _construct_result( | ||||||||
self, result: ArrayLike | tuple[ArrayLike, ArrayLike], name: Hashable | ||||||||
self, | ||||||||
result: ArrayLike | tuple[ArrayLike, ArrayLike], | ||||||||
name: Hashable, | ||||||||
other: AnyArrayLike | DataFrame | None = None, | ||||||||
|
other: AnyArrayLike | DataFrame | None = None, | |
other: AnyArrayLike | DataFrame, |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other : Series, DataFrame or array-like, default None | |
other : Series, DataFrame or array-like |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if getattr(other, "attrs", None): | |
out.__finalize__(other) | |
out = out.__finalize__(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this breaks:
FAILED pandas/tests/generic/test_duplicate_labels.py::TestPreserves::test_binops[other1-False-add] - AssertionError
FAILED pandas/tests/generic/test_duplicate_labels.py::TestPreserves::test_binops[other1-False-sub] - AssertionError
something to do with flags.allows_duplicate_labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.