Skip to content

BUG: fix Series.str.fullmatch() and Series.str.match() with a compiled regex failing with arrow strings #61964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 14, 2025

Conversation

khemkaran10
Copy link
Contributor

@khemkaran10 khemkaran10 commented Jul 26, 2025

Fixes: #61952
After Fix:

DATA = ["applep", "bananap", "Cherryp", "DATEp", "eGGpLANTp", "123p", "23.45p"]
s=pd.Series(DATA)
s.str.fullmatch(re.compile(r"applep"))

Output:
0     True
1    False
2    False
3    False
4    False
5    False
6    False
dtype: bool
DATA = ["applep", "bananap", "Cherryp", "DATEp", "eGGpLANTp", "123p", "23.45p"]
sa=pd.Series(DATA, dtype="string[pyarrow]")
sa.str.match(re.compile(r"applep"))

Output:
0     True
1    False
2    False
3    False
4    False
5    False
6    False
dtype: boolean

@jorisvandenbossche jorisvandenbossche added this to the 2.3.2 milestone Jul 26, 2025
@jorisvandenbossche jorisvandenbossche added Strings String extension data type and string data Arrow pyarrow functionality labels Jul 26, 2025
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

It seems we don't actually document that we support a compiled regular expression, although it works in practice because we pass pat to re.compile() in the non-arrow version, and that works.
But so it would be good to update the documentation and typing then to reflect the fact that a compiled pattern is also supported.

@khemkaran10
Copy link
Contributor Author

@jorisvandenbossche Moved tests to pandas/tests/strings/test_find_replace.py and made a minor change to the docstring. I’m not sure what changes need to be made in docs. could you please provide more details?

@jorisvandenbossche
Copy link
Member

I’m not sure what changes need to be made in docs. could you please provide more details?

The suggestions of @yuanx749 are in the good direction

@jorisvandenbossche jorisvandenbossche changed the title BUG FIX: Using Series.str.fullmatch() and Series.str.match() with a compiled regex fails with arrow strings BUG: fix Series.str.fullmatch() and Series.str.match() with a compiled regex failing with arrow strings Aug 13, 2025
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Just added a small whatsnew note

@jorisvandenbossche jorisvandenbossche merged commit 3cefa1e into pandas-dev:main Aug 14, 2025
38 checks passed
@jorisvandenbossche
Copy link
Member

Thanks @khemkaran10

Copy link

lumberbot-app bot commented Aug 14, 2025

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 2.3.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 3cefa1ee6b30843a24065fa67392fbfa63d0769b
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #61964: BUG: fix Series.str.fullmatch() and Series.str.match() with a compiled regex failing with arrow strings '
  1. Push to a named branch:
git push YOURFORK 2.3.x:auto-backport-of-pr-61964-on-2.3.x
  1. Create a PR against branch 2.3.x, I would have named this PR:

"Backport PR #61964 on branch 2.3.x (BUG: fix Series.str.fullmatch() and Series.str.match() with a compiled regex failing with arrow strings )"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Still Needs Manual Backport Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Using Series.str.fullmatch() and Series.str.match() with a compiled regex fails with arrow strings
3 participants