-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
BUG: Raise MergeError when suffixes result in duplicate column names … #61422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Raise MergeError when suffixes result in duplicate column names … #61422
Conversation
fe71e1a to
c9bfc5a
Compare
|
pre-commit.ci autofix |
dc1bb47 to
ad744df
Compare
|
Thanks for your contribution! Just a quick note: you don't need to write
For reference, you can check the contributing guidelines here: https://pandas.pydata.org/docs/development/contributing_codebase.html#documenting-your-code |
|
Thanks for the pointers. I'll get those added in here soon. Trying to track down why the Unit Tests / Linux-32-bit(pull_request) is failing. I didn't change anything that should have effected Series, so it's kinda weird. I also can't get the pytest to run normally on my dev yet either, so I haven't been able to fully replicate the failure locally yet. So, still a little more work to do here. |
|
@Farsidetfs I believe the CI failure is not related to your changes. It appears to be caused by the cython version — pandas unit tests fail with |
nikaltipar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few comments from a bird's eye view. Thanks for the change.
pandas/core/reshape/merge.py
Outdated
| if len(left_collisions) > 0: | ||
| raise MergeError( | ||
| "Passing 'suffixes' which cause duplicate columns " | ||
| f"{set(left_collisions)} is not allowed" | ||
| ) | ||
| if len(right_collisions) > 0: | ||
| raise MergeError( | ||
| "Passing 'suffixes' which cause duplicate columns " | ||
| f"{set(right_collisions)} is not allowed" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend you combine this into a common error to reduce repetition (bonus points for combining it with the pre-existing error just a few lines below)
pandas/core/reshape/merge.py
Outdated
| # Check for duplicates created by suffixes | ||
| left_collisions = llabels.intersection(right.difference(to_rename)) | ||
| right_collisions = rlabels.intersection(left.difference(to_rename)) | ||
| if len(left_collisions) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't just if not left_collisions.empty: work? Same for a similar check below.
pandas/core/reshape/merge.py
Outdated
| llabels = left._transform_index(lrenamer) | ||
| rlabels = right._transform_index(rrenamer) | ||
|
|
||
| # Check for duplicates created by suffixes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For new readers of this code, the comment might not be descriptive enough. While your code is supposed to find suffixes that are caused by duplicated would-be created columns across dataframes, there is an extra section that does a duplicate checking just below your new code (but would-be created columns due to duplicity within the same dataframe).
ad744df to
9a40cd0
Compare
|
pre-commit.ci autofix |
e9efd0d to
1476957
Compare
|
pre-commit.ci autofix |
cb23817 to
6b82c85
Compare
|
pre-commit.ci autofix |
5aff565 to
e62c70f
Compare
|
pre-commit.ci autofix |
|
@nikaltipar I think this should be ready now. Please let me know if I've missed anything. I took your advice and combined the two with slight modifications to improve efficiency using sets throughout rather than just convert at the end. |
Thanks for taking care of that, @Farsidetfs ! It looks good to me, no other comments from my side. Thanks for adding the unit-tests, too! |
|
@nikaltipar Could you rebase main branch to trigger CI again? |
I am not able to, I'll have to wait for @Farsidetfs |
7d3837a to
c377804
Compare
|
Rebase complete. Thanks. Let me know if there's anything else needed. |
rhshadrach
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
doc/source/whatsnew/v2.3.0.rst
Outdated
| Reshaping | ||
| ^^^^^^^^^ | ||
| - | ||
| - Bug in :meth:`DataFrame.merge` where user-provided suffixes could result in duplicate column names if the resulting names matched existing columns. Now raises a :class:`MergeError` in such cases. (:issue:`61402`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this note to v3.0.0
| df1 = DataFrame({"col1": [1], "col2": [2]}) | ||
| df2 = DataFrame({"col1": [1], "col2": [2], "col2_dup": [3]}) | ||
| with pytest.raises(MergeError, match="duplicate columns"): | ||
| merge(df1, df2, on="col1", suffixes=("_dup", "")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you parametrize this test with
@pytest.mark.parametrize("suffixes", [("_dup", ""), ("", "_dup")])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhshadrach Done. Please let me know if I missed anything.
b7b3a95 to
bc62afe
Compare
5b816ce to
9bc2b66
Compare
|
@rhshadrach All requested changes are complete, so it should be ready for your review to unblock merge. Thanks |
rhshadrach
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
@Farsidetfs - just a conflict that needs resolved in the whatsnew. |
|
pre-commit.ci autofix |
1 similar comment
|
pre-commit.ci autofix |
12ce6ce to
fdb5861
Compare
|
@rhshadrach Merge conflicts resolved, ready for merge |
|
Thanks @Farsidetfs very nice |
closes #61402
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.