Skip to content
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -592,7 +592,6 @@ Performance improvements
- Performance improvement in :meth:`RangeIndex.take` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57445`, :issue:`57752`)
- Performance improvement in :func:`merge` if hash-join can be used (:issue:`57970`)
- Performance improvement in :meth:`CategoricalDtype.update_dtype` when ``dtype`` is a :class:`CategoricalDtype` with non ``None`` categories and ordered (:issue:`59647`)
- Performance improvement in :meth:`DataFrame.astype` when converting to extension floating dtypes, e.g. "Float64" (:issue:`60066`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this should have been removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry It accidentally got deleted while running a git command. Thank you for catching it

- Performance improvement in :meth:`to_hdf` avoid unnecessary reopenings of the HDF5 file to speedup data addition to files with a very large number of groups . (:issue:`58248`)
- Performance improvement in ``DataFrameGroupBy.__len__`` and ``SeriesGroupBy.__len__`` (:issue:`57595`)
- Performance improvement in indexing operations for string dtypes (:issue:`56997`)
Expand Down Expand Up @@ -737,6 +736,8 @@ Reshaping
- Bug in :meth:`DataFrame.join` when a :class:`DataFrame` with a :class:`MultiIndex` would raise an ``AssertionError`` when :attr:`MultiIndex.names` contained ``None``. (:issue:`58721`)
- Bug in :meth:`DataFrame.merge` where merging on a column containing only ``NaN`` values resulted in an out-of-bounds array access (:issue:`59421`)
- Bug in :meth:`DataFrame.unstack` producing incorrect results when ``sort=False`` (:issue:`54987`, :issue:`55516`)
- Bug in :meth:`DataFrame.merge` when merging two dataframes with column dtype as numpy.intc resulting in ValueError: Buffer dtype mismatch, Only on windows (:issue:`60091`)
- Bug in :meth:`DataFrame.merge` when merging two dataframes with column dtype as numpy.uintc resulting in KeyError: <class 'numpy.uintc'> ,Only on windows (:issue:`58713`)
- Bug in :meth:`DataFrame.pivot_table` incorrectly subaggregating results when called without an ``index`` argument (:issue:`58722`)
- Bug in :meth:`DataFrame.unstack` producing incorrect results when manipulating empty :class:`DataFrame` with an :class:`ExtentionDtype` (:issue:`59123`)

Expand Down
12 changes: 11 additions & 1 deletion pandas/core/reshape/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,17 @@

# See https://github.com/pandas-dev/pandas/issues/52451
if np.intc is not np.int32:
_factorizers[np.intc] = libhashtable.Int64Factorizer
if np.dtype(np.intc).itemsize == 4:
_factorizers[np.intc] = libhashtable.Int32Factorizer
else:
_factorizers[np.intc] = libhashtable.Int64Factorizer

if np.uintc is not np.uint32:
if np.dtype(np.uintc).itemsize == 4:
_factorizers[np.uintc] = libhashtable.UInt32Factorizer
else:
_factorizers[np.uintc] = libhashtable.UInt64Factorizer


_known = (np.ndarray, ExtensionArray, Index, ABCSeries)

Expand Down
35 changes: 35 additions & 0 deletions pandas/tests/reshape/merge/test_merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -1843,6 +1843,41 @@ def test_merge_empty(self, left_empty, how, exp):

tm.assert_frame_equal(result, expected)

def test_merge_with_uintc_columns(self):
df1 = DataFrame({"a": ["foo", "bar"], "b": np.array([1, 2], dtype=np.uintc)})
df2 = DataFrame({"a": ["foo", "baz"], "b": np.array([3, 4], dtype=np.uintc)})
result = df1.merge(df2, how="outer")
expected = DataFrame(
{
"a": ["bar", "baz", "foo", "foo"],
"b": np.array([2, 4, 1, 3], dtype=np.uintc),
}
)
tm.assert_frame_equal(result.reset_index(drop=True), expected)

def test_merge_with_intc_columns(self):
df1 = DataFrame({"a": ["foo", "bar"], "b": np.array([1, 2], dtype=np.intc)})
df2 = DataFrame({"a": ["foo", "baz"], "b": np.array([3, 4], dtype=np.intc)})
result = df1.merge(df2, how="outer")
expected = DataFrame(
{
"a": ["bar", "baz", "foo", "foo"],
"b": np.array([2, 4, 1, 3], dtype=np.intc),
}
)
tm.assert_frame_equal(result.reset_index(drop=True), expected)

def test_merge_intc_non_monotonic(self):
df = DataFrame({"join_key": Series([0, 2, 1], dtype=np.intc)})
df_details = DataFrame(
{"join_key": Series([0, 1, 2], dtype=np.intc), "value": ["a", "b", "c"]}
)
merged = df.merge(df_details, on="join_key", how="left")
expected = DataFrame(
{"join_key": np.array([0, 2, 1], dtype=np.intc), "value": ["a", "c", "b"]}
)
tm.assert_frame_equal(merged.reset_index(drop=True), expected)


@pytest.fixture
def left():
Expand Down
Loading