Skip to content

Conversation

aijams
Copy link

@aijams aijams commented Sep 5, 2025

This PR changes the final call to DataFrame.__finalize__ within pandas.merge to propagate the flags from both of its arguments and the metadata from its left argument.

@simonjayhawkins simonjayhawkins added metadata _metadata, .attrs Bug labels Sep 10, 2025
@aijams aijams marked this pull request as ready for review September 19, 2025 19:20
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. The OP and the whatsnew indicate that there is a code change occurring here, yet I do not see any. Only comments, docstrings, type-hints, and tests.

@aijams
Copy link
Author

aijams commented Oct 2, 2025

It appears to me that merge already does what it's supposed to. According to the most recent documentation, merge is supposed to only propagate metadata if both of its inputs have exactly the same metadata, which is what it currently does. Perhaps there's some part of its contract which I'm unaware of, in which case I'd appreciate if you could point me to some documentation that describes it. Otherwise, I will remove the item from the whatsnew file. At first, I made some changes to this method, but I discovered that it works already in terms of metadata.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

- Bug in :meth:`DataFrame.unstack` producing incorrect results when manipulating empty :class:`DataFrame` with an :class:`ExtentionDtype` (:issue:`59123`)
- Bug in :meth:`concat` where concatenating DataFrame and Series with ``ignore_index = True`` drops the series name (:issue:`60723`, :issue:`56257`)
- Bug in :func:`melt` where calling with duplicate column names in ``id_vars`` raised a misleading ``AttributeError`` (:issue:`61475`)
- Bug in :meth:`DataFrame.merge` where the result of a merge does not contain any metadata or flag information from the inputs to the merge. (:issue:`28283`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, remove the note


result = self._reindex_and_concat(join_index, left_indexer, right_indexer)

# Is this call to __finalize__ really necessary?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I believe this can be removed.

Comment on lines 1155 to 1156
# __finalize is responsible for copying the metadata from the inputs to merge
# to the result.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this comment, it does not add any information that isn't already documented in __finalize__ itself.

Comment on lines 1178 to 1179
Add one indicator column to each of the left and right inputs to a
merge operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: short summary should be 1 line. I think you can just remove to a merge operation

Comment on lines 6107 to 6108
``attrs`` attribute, in which case, each such ``attrs`` instance
must be a dictionary that is equal to all of the others.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By saying "must be a dictionary that is equal to all of the others", it sounds like that is a requirement for calling this function but that is not the case. Rather, it just won't propagate when this requirement is not met.

Comment on lines 699 to 704
"""Check that DataFrame.merge correctly sets the allow_duplicate_labels flag
on its result.
The flag on the result should be set to true if and only if both arguments
to merge have their flags set to True.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove docstrings from tests, we generally do not add these. Comments are fine if they are adding something that is not already in the code itself.

The flag on the result should be set to true if and only if both arguments
to merge have their flags set to True.
"""
# Arrange
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove these arrange/act/assert comments.

Comment on lines 766 to 767
left.attrs = {"b": 3}
assert result.attrs == metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will check whether or not a deep copy is made. Recommend doing

assert result.attrs is not left.attrs
assert result.attrs["a"] is not left.attrs["a"]

instead. Also I recommend adding in right as well for good measure.

Comment on lines 787 to 790
no_metadata = pd.DataFrame({"test": [1]})

has_metadata = pd.DataFrame({"test": [1]})
has_metadata.attrs = {"a": 2}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having module level variables, can you use indicator strings in the parametrization (e.g. "no_metadata", "has_metadata") and then do something like:

if left == "no_metadata":
     left = pd.DataFrame(...)
else:
    ...

Comment on lines 374 to 375
else:
return str(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like just a style preference? Can you revert this change.

@aijams
Copy link
Author

aijams commented Oct 8, 2025

The pre-commit checks prevent me from reverting the formatting changes you mentioned above. Specifically, the three lines of the form

"<argument_name>", [<argument_values>]

in test_finalize.py.

@rhshadrach
Copy link
Member

The pre-commit checks prevent me from reverting the formatting changes you mentioned above

I am able to revert the changes without problem. Ensure that you are not leaving a trailing comma on the line. Alternatively, I'd be happy to push a commit, but you need to check the box on the right side of the PR that allows maintainers to edit.

@aijams
Copy link
Author

aijams commented Oct 9, 2025

I removed the trailing commas from the tests. Let me know if there's anything else you need from this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug metadata _metadata, .attrs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Various methods don't call call __finalize__

3 participants