Skip to content

Commit a82d1dd

Browse files
committed
Fix out-of-bounds violations in safe_sort for empty arrays.
Previously we masked `codes` referring to out-of-bounds elements to 0 and then fixed them after to -1 using `np.putmask`. However, this results in out-of-bounds access in `take_nd` if the array is empty. Instead, set all out-of-bounds indices in `codes` to -1 immediately, as these can be handled by `take_nd`.
1 parent a3f2d48 commit a82d1dd

File tree

3 files changed

+9
-6
lines changed

3 files changed

+9
-6
lines changed

doc/source/whatsnew/v3.0.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -621,6 +621,7 @@ Reshaping
621621
^^^^^^^^^
622622
- Bug in :func:`qcut` where values at the quantile boundaries could be incorrectly assigned (:issue:`59355`)
623623
- Bug in :meth:`DataFrame.join` inconsistently setting result index name (:issue:`55815`)
624+
- Bug in :meth:`DataFrame.merge` where merging on a column containing only ``NaN`` values resulted in an out-of-bounds array access (:issue:`59421`)
624625
- Bug in :meth:`DataFrame.unstack` producing incorrect results when ``sort=False`` (:issue:`54987`, :issue:`55516`)
625626
- Bug in :meth:`DataFrame.unstack` producing incorrect results when manipulating empty :class:`DataFrame` with an :class:`ExtentionDtype` (:issue:`59123`)
626627

pandas/core/algorithms.py

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1529,9 +1529,7 @@ def safe_sort(
15291529
order2 = sorter.argsort()
15301530
if verify:
15311531
mask = (codes < -len(values)) | (codes >= len(values))
1532-
codes[mask] = 0
1533-
else:
1534-
mask = None
1532+
codes[mask] = -1
15351533
new_codes = take_nd(order2, codes, fill_value=-1)
15361534
else:
15371535
reverse_indexer = np.empty(len(sorter), dtype=int)
@@ -1545,9 +1543,6 @@ def safe_sort(
15451543
if verify:
15461544
mask = mask | (codes < -len(values)) | (codes >= len(values))
15471545

1548-
if use_na_sentinel and mask is not None:
1549-
np.putmask(new_codes, mask, -1)
1550-
15511546
return ordered, ensure_platform_int(new_codes)
15521547

15531548

pandas/tests/test_sorting.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -408,6 +408,13 @@ def test_codes_out_of_bound(self):
408408
tm.assert_numpy_array_equal(result, expected)
409409
tm.assert_numpy_array_equal(result_codes, expected_codes)
410410

411+
@pytest.mark.parametrize("codes", [[-1, -1], [2, -1], [2, 2]])
412+
def test_codes_empty_array_out_of_bound(self, codes):
413+
empty_values = np.array([])
414+
expected_codes = -np.ones_like(codes, dtype=np.intp)
415+
_, result_codes = safe_sort(empty_values, codes)
416+
tm.assert_numpy_array_equal(result_codes, expected_codes)
417+
411418
def test_mixed_integer(self):
412419
values = np.array(["b", 1, 0, "a", 0, "b"], dtype=object)
413420
result = safe_sort(values)

0 commit comments

Comments
 (0)