-
Notifications
You must be signed in to change notification settings - Fork 103
MAINT: Prepare patsy for pandas3 StringDtype #229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Adds support for StringDtype Fixes tests that are not valid with copy-on-write
4001d9e
to
6a1e812
Compare
2ceaeab
to
f282fae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR prepares the patsy library for pandas 3 compatibility by adding support for StringDtype and fixing tests that are incompatible with pandas 3's copy-on-write behavior. The changes ensure the library continues to work with newer pandas versions while maintaining backward compatibility.
- Adds detection logic for pandas 3 and StringDtype support
- Updates test assertions to account for copy-on-write behavior in pandas 3
- Extends dtype checking functions to handle StringDtype alongside existing categorical dtype support
f94b74d
to
a964dab
Compare
Special 1 and 2 d cases when indexing for pandas 3 support
c15476b
to
f15e36c
Compare
for more information, see https://pre-commit.ci
@@ -183,7 +183,7 @@ def _handle_NA_drop(self, values, is_NAs, origins): | |||
total_mask |= is_NA | |||
good_mask = ~total_mask | |||
# "..." to handle 1- versus 2-dim indexing | |||
return [v[good_mask, ...] for v in values] | |||
return [v[good_mask] if v.ndim == 1 else v[good_mask, ...] for v in values] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this due to upstream indexing changes? Kind of annoying if ...
no longer supports "zero" expansion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it appears that it is stricter and no longer supports expansion. I only found this by running against the statsmodels test suite.
@@ -799,7 +820,7 @@ def test_safe_is_pandas_categorical(): | |||
# https://github.com/pydata/pandas/issues/9581 | |||
# https://github.com/pydata/pandas/issues/9581#issuecomment-77099564 | |||
def safe_issubdtype(dt1, dt2): | |||
if safe_is_pandas_categorical_dtype(dt1): | |||
if safe_is_pandas_categorical_dtype(dt1) or safe_is_pandas_string_dtype(dt1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... are there other places that string dtypes should be treated as categorical, no? I'll need to take a look, since I haven't looked at patsy code for a while.
Adds support for StringDtype
Fixes tests that are not valid with copy-on-write