-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
BUG: Modifies pandas.merge to propagate flags and metadata from its inputs to its output. #62266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…tadata Updated my contribution with the latest changes from the main pandas repo.
…tadata Updated pull request with latest changes from main Pandas repo.
…tadata Updated my pull request with changes from main Pandas repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. The OP and the whatsnew indicate that there is a code change occurring here, yet I do not see any. Only comments, docstrings, type-hints, and tests.
It appears to me that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
doc/source/whatsnew/v3.0.0.rst
Outdated
- Bug in :meth:`DataFrame.unstack` producing incorrect results when manipulating empty :class:`DataFrame` with an :class:`ExtentionDtype` (:issue:`59123`) | ||
- Bug in :meth:`concat` where concatenating DataFrame and Series with ``ignore_index = True`` drops the series name (:issue:`60723`, :issue:`56257`) | ||
- Bug in :func:`melt` where calling with duplicate column names in ``id_vars`` raised a misleading ``AttributeError`` (:issue:`61475`) | ||
- Bug in :meth:`DataFrame.merge` where the result of a merge does not contain any metadata or flag information from the inputs to the merge. (:issue:`28283`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, remove the note
pandas/core/reshape/merge.py
Outdated
|
||
result = self._reindex_and_concat(join_index, left_indexer, right_indexer) | ||
|
||
# Is this call to __finalize__ really necessary? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I believe this can be removed.
pandas/core/reshape/merge.py
Outdated
# __finalize is responsible for copying the metadata from the inputs to merge | ||
# to the result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this comment, it does not add any information that isn't already documented in __finalize__
itself.
pandas/core/reshape/merge.py
Outdated
Add one indicator column to each of the left and right inputs to a | ||
merge operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: short summary should be 1 line. I think you can just remove to a merge operation
pandas/core/generic.py
Outdated
``attrs`` attribute, in which case, each such ``attrs`` instance | ||
must be a dictionary that is equal to all of the others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By saying "must be a dictionary that is equal to all of the others", it sounds like that is a requirement for calling this function but that is not the case. Rather, it just won't propagate when this requirement is not met.
"""Check that DataFrame.merge correctly sets the allow_duplicate_labels flag | ||
on its result. | ||
The flag on the result should be set to true if and only if both arguments | ||
to merge have their flags set to True. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove docstrings from tests, we generally do not add these. Comments are fine if they are adding something that is not already in the code itself.
The flag on the result should be set to true if and only if both arguments | ||
to merge have their flags set to True. | ||
""" | ||
# Arrange |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove these arrange/act/assert comments.
left.attrs = {"b": 3} | ||
assert result.attrs == metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will check whether or not a deep copy is made. Recommend doing
assert result.attrs is not left.attrs
assert result.attrs["a"] is not left.attrs["a"]
instead. Also I recommend adding in right
as well for good measure.
no_metadata = pd.DataFrame({"test": [1]}) | ||
|
||
has_metadata = pd.DataFrame({"test": [1]}) | ||
has_metadata.attrs = {"a": 2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having module level variables, can you use indicator strings in the parametrization (e.g. "no_metadata", "has_metadata") and then do something like:
if left == "no_metadata":
left = pd.DataFrame(...)
else:
...
else: | ||
return str(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like just a style preference? Can you revert this change.
The pre-commit checks prevent me from reverting the formatting changes you mentioned above. Specifically, the three lines of the form
in |
I am able to revert the changes without problem. Ensure that you are not leaving a trailing comma on the line. Alternatively, I'd be happy to push a commit, but you need to check the box on the right side of the PR that allows maintainers to edit. |
I removed the trailing commas from the tests. Let me know if there's anything else you need from this PR. |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.This PR changes the final call to
DataFrame.__finalize__
withinpandas.merge
to propagate the flags from both of its arguments and the metadata from its left argument.