-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
BUG: Fix Index.equals between object and string #61541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
rhshadrach
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
pandas/tests/indexes/test_base.py
Outdated
| ), | ||
| ], | ||
| ) | ||
| def test_index_equals_different_string_dtype(dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you instead use the fixture string_dtype_no_object throughout these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rhshadrach , changed the code.
|
|
||
| if ( | ||
| isinstance(self.dtype, StringDtype) | ||
| and self.dtype.na_value is np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition was added in #56106, I think the na_value part was added just to be conservative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Yes, maybe the bug was only confirmed for pyarrow_numpy at that moment.
| s_str = Series([4, 5, 6], index=idx.astype(dtype)) | ||
|
|
||
| expected = Series([True, True, True], index=["a", "b", "c"]) | ||
| result = s_obj < s_str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also check s_str > s_obj.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! added s_str > s_obj.
rhshadrach
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
Thanks @sanggon6107! |
doc/source/whatsnew/v3.0.0.rstfile if fixing a bug or adding a new feature.Description of the code change on
Index.equalsOn the main branch,
Index.equalscastsselftoobjectonly whenself.dtype.na_valueisnp.nan. The comparison actually succeeds whenself.dtype.na_valueisnp.nanas below.However, since doc stated that
dtypeis not compared,selfshould be casted regardless ofself.dtype.na_valueso thatselfcould be compared with other dtypes as desired.Description of the code change on
test_mixed_col_index_dtypeusing_infer_stringhas been removed since I think thatresultshould bestringregardless ofusing_infer_string. This is becaus of the code change made onIndex.equals- sinceIndex.equalsconsiderdf1.columnsis equal todf2.colums,Index.intersectionreturnsself(which isstring). You could see the result becomesobject(which is the dtype ofdf2) in case ofresult = df2 + df1. On the main branch, on the other hand,Index.intersectionreturnsobjectbecauseIndex.equalsreturnsFalse, and then bothselfandotherare cast toobjectby_find_common_type_compat. (seeL3287at pandas/core/indexes/base.py)pandas/pandas/core/indexes/base.py
Lines 3286 to 3290 in 25e6462