Skip to content
Merged
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Bug fixes
~~~~~~~~~
- Bug in :class:`Series` constructor raising DeprecationWarning when ``index`` is a list of :class:`Series` (:issue:`55228`)
- Fixed bug in :meth:`DataFrame.__setitem__` casting :class:`Index` with object-dtype to PyArrow backed strings when ``infer_string`` option is set (:issue:`55638`)
- Fixed bug in :meth:`Index.get_indexer` round-tripping through string dtype when ``infer_string`` is enabled (:issue:`55834`)
- Fixed bug in :meth:`Index.insert` casting object-dtype to PyArrow backed strings when ``infer_string`` option is set (:issue:`55638`)
-

Expand Down
3 changes: 3 additions & 0 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -6591,6 +6591,9 @@ def _maybe_cast_listlike_indexer(self, target) -> Index:
"""
Analogue to maybe_cast_indexer for get_indexer instead of get_loc.
"""
if not hasattr(target, "dtype") and self.dtype == object:
# Avoid inference for object since we are casting back later anyway
return Index(target, dtype=self.dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this might prevent us from doing intentional inference in _maybe_downcast_for_indexing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel could you think of a concrete situation where this might happen?

(the tests are all passing, so don't seem to cover that?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at _maybe_downcast_for_indexing more closely, we are downcasting object dtype index to the other dtype, because this can give a performance improvement, right?

This might actually also be true for object/string combo, but I think correctness if more important, and if missing values are handled differently by downcasting, we should avoid that for now for this combo, until we can solve that better.

I will just make the check a bit more specific for object dtype then (even more ugly, but at least fixing this issue)

return ensure_index(target)

@final
Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/indexes/object/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import pytest

from pandas._libs.missing import is_matching_na
import pandas.util._test_decorators as td

import pandas as pd
from pandas import Index
Expand Down Expand Up @@ -55,6 +56,14 @@ def test_get_indexer_with_NA_values(
expected = np.array([0, 1, -1], dtype=np.intp)
tm.assert_numpy_array_equal(result, expected)

@td.skip_if_no("pyarrow")
def test_get_indexer_infer_string_missing_values(self):
# GH#55834
idx = Index(["a", "b", None], dtype="object")
result = idx.get_indexer([None, "x"])
expected = np.array([2, -1])
tm.assert_numpy_array_equal(result, expected, check_dtype=False)


class TestGetIndexerNonUnique:
def test_get_indexer_non_unique_nas(self, nulls_fixture):
Expand Down