Skip to content

Conversation

aijams
Copy link

@aijams aijams commented Sep 29, 2025

In the edge case when called with integer arrays and asked to access non-existent entries (to be replaced with NaN), the take method of NumpyExtensionArray produces arrays whose dtypes don't match their underlying data.
Specifically, take promotes the underlying data to a floating-point type, but doesn't promote the dtype of the extension array to match.
These changes ensure that the result of this method has the correct dtype for its data.

@aijams aijams marked this pull request as draft September 29, 2025 18:11
@rhshadrach rhshadrach added Bug ExtensionArray Extending pandas with custom dtypes or arrays. Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Sep 30, 2025
@aijams
Copy link
Author

aijams commented Oct 2, 2025

Several tests are currently failing due to a new version of numexpr.
#62545
All other tests pass.
I hard-coded a list of dtypes in take to check for the integer types that can't store NaN values since setting the dtype on an extension array after it's created isn't allowed. I have yet to think of another way to correct this issue without modifying the base extension array class to allow its dtype to be modified.
Let me know if you thought of a better way of approaching this.

@aijams aijams marked this pull request as ready for review October 7, 2025 16:33
fill_value=fill_value,
axis=axis,
)
if self.dtype in [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this on the NumpySemantics instead of this mixin



@pytest.mark.parametrize("dtype", [np.uint32, np.uint64, np.int32, np.int64])
def test_take_assigns_correct_dtype(dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment pointing back to the motivating GH issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants