Skip to content

Conversation

dsaxton
Copy link
Contributor

@dsaxton dsaxton commented Feb 22, 2021

If there was somehow a way to copy the mask only on write so we aren't always copying that may be better. @simonjayhawkins I put this in the IntegerArray tests even though we're technically checking a Series, let me know if you'd prefer I rewrite or move elsewhere.

@simonjayhawkins simonjayhawkins added Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Feb 22, 2021
@simonjayhawkins simonjayhawkins added this to the 1.2.3 milestone Feb 22, 2021
@simonjayhawkins
Copy link
Member

If there was somehow a way to copy the mask only on write so we aren't always copying that may be better.

for a backport, let's keep this simple.

tm.assert_extension_array_equal(result, expected)


@pytest.mark.parametrize("op", ["__neg__", "__abs__"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also test __invert__, since I believe has the same issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, that required another change

@jreback
Copy link
Contributor

jreback commented Feb 22, 2021

If there was somehow a way to copy the mask only on write so we aren't always copying that may be better. @simonjayhawkins I put this in the IntegerArray tests even though we're technically checking a Series, let me know if you'd prefer I rewrite or move elsewhere.

this is not necessary, these copies are very cheap, semantics are way more important.

tm.assert_extension_array_equal(result, expected)


@pytest.mark.parametrize("op", ["__neg__", "__abs__", "__invert__"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test for boolean as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added

@jreback
Copy link
Contributor

jreback commented Feb 22, 2021

lgtm. waiting on CI.

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Feb 23, 2021

TestAccessor.test_from_coo also failing on master https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=54759&view=logs&j=4b4d380c-92b5-58cf-5b80-2c6c9b1d5805&t=8eab1181-1335-58ed-7ed5-a271d9d86005&l=80

2021-02-23T01:07:24.4100333Z 
2021-02-23T01:07:24.4100753Z ================================== FAILURES ===================================
2021-02-23T01:07:24.4101532Z _________________________ TestAccessor.test_from_coo __________________________
2021-02-23T01:07:24.4102065Z [gw0] win32 -- Python 3.7.9 C:\Miniconda\envs\pandas-dev\python.exe
2021-02-23T01:07:24.4102361Z 
2021-02-23T01:07:24.4103135Z self = <pandas.tests.arrays.sparse.test_array.TestAccessor object at 0x000001317E6BC888>
2021-02-23T01:07:24.4103462Z 
2021-02-23T01:07:24.4103797Z     @td.skip_if_no_scipy
2021-02-23T01:07:24.4104207Z     def test_from_coo(self):
2021-02-23T01:07:24.4104597Z         import scipy.sparse
2021-02-23T01:07:24.4104947Z     
2021-02-23T01:07:24.4105283Z         row = [0, 3, 1, 0]
2021-02-23T01:07:24.4105674Z         col = [0, 3, 1, 2]
2021-02-23T01:07:24.4106045Z         data = [4, 5, 7, 9]
2021-02-23T01:07:24.4106494Z         sp_array = scipy.sparse.coo_matrix((data, (row, col)))
2021-02-23T01:07:24.4106960Z         result = pd.Series.sparse.from_coo(sp_array)
2021-02-23T01:07:24.4107356Z     
2021-02-23T01:07:24.4107834Z         index = pd.MultiIndex.from_arrays([[0, 0, 1, 3], [0, 2, 1, 3]])
2021-02-23T01:07:24.4108389Z         expected = pd.Series([4, 9, 7, 5], index=index, dtype="Sparse[int]")
2021-02-23T01:07:24.4109198Z >       tm.assert_series_equal(result, expected)
2021-02-23T01:07:24.4109817Z E       AssertionError: Attributes of Series are different
2021-02-23T01:07:24.4110213Z E       
2021-02-23T01:07:24.4110548Z E       Attribute "dtype" are different
2021-02-23T01:07:24.4110965Z E       [left]:  Sparse[float64, nan]
2021-02-23T01:07:24.4111366Z E       [right]: Sparse[int32, 0]
2021-02-23T01:07:24.4111579Z 
2021-02-23T01:07:24.4112141Z pandas\tests\arrays\sparse\test_array.py:1193: AssertionError
2021-02-23T01:07:24.4112871Z ============================== warnings summary ===============================

@simonjayhawkins
Copy link
Member

will merge after the #39979 is merged. (otherwise may not be able to auto backport)

@simonjayhawkins simonjayhawkins merged commit d25cb7f into pandas-dev:master Feb 23, 2021
@simonjayhawkins
Copy link
Member

@meeseeksdev backport 1.2.x

~~~~~~~~~~~~~~~~~

- Fixed regression in :meth:`~DataFrame.to_excel` raising ``KeyError`` when giving duplicate columns with ``columns`` attribute (:issue:`39695`)
- Fixed regression in :class:`IntegerArray` unary ops propagating mask on assignment (:issue:`39943`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, I think we should generally say something like "nullable integer dtype" instead of "IntegerArray", as most users should never deal / know about IntegerArray

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point, I can update this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. #40019. thanks @dsaxton

return type(self)(-self._data, self._mask.copy())

def __pos__(self):
return self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do the same for __pos__?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not for a backport. but indeed this needs some discussion to ensure consistency across the codebase. see TimedeltaArray

    def __pos__(self) -> TimedeltaArray:
        return type(self)(self._data, freq=self.freq)
>>> import pandas as pd
>>> pd.__version__
'1.3.0.dev0+804.g3289f82975'
>>>
>>> arr = pd.timedelta_range(0, periods=10)._values
>>> arr
<TimedeltaArray>
['0 days', '1 days', '2 days', '3 days', '4 days', '5 days', '6 days',
 '7 days', '8 days', '9 days']
Length: 10, dtype: timedelta64[ns]
>>>
>>> arr2 = +arr
>>>
>>> arr2 is arr
False
>>>
>>> arr2._data is arr._data
True
>>>
>>> arr2[5] = None
>>>
>>> arr
<TimedeltaArray>
['0 days', '1 days', '2 days', '3 days', '4 days',      NaT, '6 days',
 '7 days', '8 days', '9 days']
Length: 10, dtype: timedelta64[ns]
>>>
>>>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see, so that's indeed something we should fix more generally. Is there already an issue about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context here I had been thinking that something like s1 = +s should behave the same way as s1 = s, but actually I think you are right. Likely +s should return something new instead of simply a reference to the thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there already an issue about it?

no. will need to create one, since the issue that this PR addresses is slightly different. #39943 is about different arrays sharing a mask, whereas returning self for __pos__, although maybe incorrect, does not create the inconsistencies when assigning null and non-null vales


@pytest.mark.parametrize("op", ["__neg__", "__abs__", "__invert__"])
@pytest.mark.parametrize(
"values, dtype", [([1, 2, 3], "Int64"), ([True, False, True], "boolean")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use the data fixture here as used in the other tests above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is being addressed in #39916, so no action required here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants