Skip to content

Conversation

bashtage
Copy link
Contributor

Adds support for StringDtype
Fixes tests that are not valid with copy-on-write

Adds support for StringDtype
Fixes tests that are not valid with copy-on-write
@bashtage bashtage requested review from Copilot and matthewwardrop and removed request for Copilot July 22, 2025 11:20
Copilot

This comment was marked as outdated.

@bashtage bashtage requested a review from Copilot July 22, 2025 11:37
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR prepares the patsy library for pandas 3 compatibility by adding support for StringDtype and fixing tests that are incompatible with pandas 3's copy-on-write behavior. The changes ensure the library continues to work with newer pandas versions while maintaining backward compatibility.

  • Adds detection logic for pandas 3 and StringDtype support
  • Updates test assertions to account for copy-on-write behavior in pandas 3
  • Extends dtype checking functions to handle StringDtype alongside existing categorical dtype support

Special 1 and 2 d cases when indexing for pandas 3 support
@@ -183,7 +183,7 @@ def _handle_NA_drop(self, values, is_NAs, origins):
total_mask |= is_NA
good_mask = ~total_mask
# "..." to handle 1- versus 2-dim indexing
return [v[good_mask, ...] for v in values]
return [v[good_mask] if v.ndim == 1 else v[good_mask, ...] for v in values]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this due to upstream indexing changes? Kind of annoying if ... no longer supports "zero" expansion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it appears that it is stricter and no longer supports expansion. I only found this by running against the statsmodels test suite.

@@ -799,7 +820,7 @@ def test_safe_is_pandas_categorical():
# https://github.com/pydata/pandas/issues/9581
# https://github.com/pydata/pandas/issues/9581#issuecomment-77099564
def safe_issubdtype(dt1, dt2):
if safe_is_pandas_categorical_dtype(dt1):
if safe_is_pandas_categorical_dtype(dt1) or safe_is_pandas_string_dtype(dt1):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... are there other places that string dtypes should be treated as categorical, no? I'll need to take a look, since I haven't looked at patsy code for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants