Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -847,6 +847,7 @@ Reshaping
- Bug in :meth:`DataFrame.stack` with the new implementation where ``ValueError`` is raised when ``level=[]`` (:issue:`60740`)
- Bug in :meth:`DataFrame.unstack` producing incorrect results when manipulating empty :class:`DataFrame` with an :class:`ExtentionDtype` (:issue:`59123`)
- Bug in :meth:`concat` where concatenating DataFrame and Series with ``ignore_index = True`` drops the series name (:issue:`60723`, :issue:`56257`)
- Bug in :func:`melt` where calling with duplicate column names in ``id_vars`` raised a misleading ``AttributeError`` (:issue:`61475`)

Sparse
^^^^^^
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/reshape/melt.py
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,9 @@ def melt(
mdata: dict[Hashable, AnyArrayLike] = {}
for col in id_vars:
id_data = frame.pop(col)
# GH61475 - prevent AttributeError when duplicate column
if not hasattr(id_data, "dtype"):
raise Exception(f"{col} is a duplicate column header")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This should check if not frame.columns.is_unique at the beginning of the function.
  2. A ValueError is more appropriate here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I've moved the check for not frame.columns.is_unique to the beginning of the function and updated the exception type to ValueError as suggested.

A quick clarification question: currently melt allows duplicate column names in 'value_vars', as seen in the test test_melt_with_duplicate_columns.With this change, are we treating any duplicate columns in the input DataFrame as a ValueError? Not just when the duplicates appear in id_vars?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point. I guess this specifically when id_vars is not empty we'll want to raise if not frame.columns.is_unique

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just to make sure I understand:
You're saying that we should only raise an error in duplicate column names if the duplicate is in id_vars, correct? Essentially if the duplicates are in value_vars then, we can let the melt function work as is, as long as no errors are occuring?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, correct.

if not isinstance(id_data.dtype, np.dtype):
# i.e. ExtensionDtype
if num_cols_adjusted > 0:
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/reshape/test_melt.py
Original file line number Diff line number Diff line change
Expand Up @@ -555,6 +555,14 @@ def test_melt_multiindex_columns_var_name_too_many(self):
):
df.melt(var_name=["first", "second", "third"])

def test_melt_duplicate_column_header_raises(self):
# GH61475
df = DataFrame([[1, 2, 3], [3, 4, 5]], columns=["A", "A", "B"])
msg = "A is a duplicate column header"

with pytest.raises(Exception, match=msg):
df.melt(id_vars=["A"], value_vars=["B"])


class TestLreshape:
def test_pairs(self):
Expand Down
Loading