Skip to content

Commit 1d22331

Browse files
BUG: Fix Series.str.contains with compiled regex on Arrow string dtype (#61946)
Co-authored-by: Joris Van den Bossche <[email protected]>
1 parent 3cefa1e commit 1d22331

File tree

3 files changed

+17
-2
lines changed

3 files changed

+17
-2
lines changed

doc/source/whatsnew/v2.3.2.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ Bug fixes
2626
"string" type in the JSON Table Schema for :class:`StringDtype` columns
2727
(:issue:`61889`)
2828
- Boolean operations (``|``, ``&``, ``^``) with bool-dtype objects on the left and :class:`StringDtype` objects on the right now cast the string to bool, with a deprecation warning (:issue:`60234`)
29-
- Fixed ``~Series.str.match`` and ``~Series.str.fullmatch`` with compiled regex
30-
for the Arrow-backed string dtype (:issue:`61964`)
29+
- Fixed ``~Series.str.match``, ``~Series.str.fullmatch`` and ``~Series.str.contains``
30+
with compiled regex for the Arrow-backed string dtype (:issue:`61964`, :issue:`61942`)
3131

3232
.. ---------------------------------------------------------------------------
3333
.. _whatsnew_232.contributors:

pandas/core/arrays/string_arrow.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -346,6 +346,8 @@ def _str_contains(
346346
):
347347
if flags:
348348
return super()._str_contains(pat, case, flags, na, regex)
349+
if isinstance(pat, re.Pattern):
350+
pat = pat.pattern
349351

350352
return ArrowStringArrayMixin._str_contains(self, pat, case, flags, na, regex)
351353

pandas/tests/strings/test_find_replace.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -281,6 +281,19 @@ def test_contains_nan(any_string_dtype):
281281
tm.assert_series_equal(result, expected)
282282

283283

284+
def test_contains_compiled_regex(any_string_dtype):
285+
# GH#61942
286+
ser = Series(["foo", "bar", "baz"], dtype=any_string_dtype)
287+
pat = re.compile("ba.")
288+
result = ser.str.contains(pat)
289+
290+
expected_dtype = (
291+
np.bool_ if is_object_or_nan_string_dtype(any_string_dtype) else "boolean"
292+
)
293+
expected = Series([False, True, True], dtype=expected_dtype)
294+
tm.assert_series_equal(result, expected)
295+
296+
284297
# --------------------------------------------------------------------------------------
285298
# str.startswith
286299
# --------------------------------------------------------------------------------------

0 commit comments

Comments
 (0)