Skip to content

Conversation

dsaxton
Copy link
Contributor

@dsaxton dsaxton commented Apr 19, 2020

@dsaxton dsaxton added Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Apr 19, 2020
@dsaxton dsaxton marked this pull request as draft April 19, 2020 16:13
@dsaxton dsaxton changed the title BUG: Fix use_inf_as_na bug for StringArray BUG: Fix StringArray use_inf_as_na bug Apr 19, 2020
@jreback
Copy link
Contributor

jreback commented Apr 19, 2020

you also have #33629 for this same issue?

@dsaxton
Copy link
Contributor Author

dsaxton commented Apr 19, 2020

you also have #33629 for this same issue?

We were thinking we may want to treat them separately since they're slightly different issues: #33629 (comment). Should both bugs be in the same PR?

@jorisvandenbossche
Copy link
Member

Can you also add a test somewhere for an object dtype Series that has pd.NA in it? (that won't be covered by the string dtype tests)

This shouldn't be "draft" anymore?

@dsaxton
Copy link
Contributor Author

dsaxton commented Apr 20, 2020

This shouldn't be "draft" anymore?

I was hoping to finish up the PR linked above first because it overlaps somewhat with this one

@jbrockmendel
Copy link
Member

Does this address #33209?

@dsaxton
Copy link
Contributor Author

dsaxton commented Apr 20, 2020

Does this address #33209?

Yes I think it should

@dsaxton dsaxton marked this pull request as ready for review April 28, 2020 02:38
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a base extension test for this? (there should already be isna() tests, so you can add a similar one there but which is using use_inf_as_na

@jreback jreback requested a review from jbrockmendel April 29, 2020 22:40
@jreback jreback added this to the 1.1 milestone Apr 29, 2020
@dsaxton
Copy link
Contributor Author

dsaxton commented Apr 30, 2020

Can you also add a base extension test for this? (there should already be isna() tests, so you can add a similar one there but which is using use_inf_as_na

Added some base extension use_inf_as_na tests, however I'm getting failures from test_sparse.py (dtype-related, I believe), test_json.py and test_decimal.py (the latter two I think because of improper handling of null values; e.g., {} and Decimal("NaN") under the use_inf_as_na context). Should I xfail for these for now and try to address in a follow-up?

if is_extension_array_dtype(dtype):
if inf_as_na:
result = values.isna() | (values == -np.inf) | (values == np.inf)
result = libmissing.isnaobj_old(values.to_numpy())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we short-circuit this (and avoid the object-cast of to_numpy) for e.g. PeriodArray?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in general, but we could short-circuit the known built-in dtypes that will never be able to contain inf (basically all our internal EA dtypes except for categorical ?)

@jorisvandenbossche
Copy link
Member

@dsaxton can you merge master once more?

And what is left for this PR? I think only the comment of @jbrockmendel to not do the expensive libmissing.isnaobj_old(values.to_numpy()) check for certain dtypes where this will never be needed (like Periods) ?

@dsaxton
Copy link
Contributor Author

dsaxton commented May 9, 2020

And what is left for this PR?

@jorisvandenbossche There was also this issue that I still need to figure out (I think it could potentially be fixed with the short circuiting mentioned above): #33656 (comment)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. ping on green. I don't think its worth the trouble to deprecate this, possibly could do some more code cleaning, but so far so good (IIRC you de-duplicated things a bit already);

@dsaxton
Copy link
Contributor Author

dsaxton commented May 10, 2020

@jreback ping

@jreback jreback merged commit 678a9ac into pandas-dev:master May 10, 2020
@jreback
Copy link
Contributor

jreback commented May 10, 2020

thanks @dsaxton

@dsaxton dsaxton deleted the string-use-inf-as-na branch May 10, 2020 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: pandas.options.mode.use_inf_as_na disables NA checking for StringArray

4 participants