Skip to content

Sample ExtensionArray.take implementation doesn't correctly handle scalars #38762

@TomAugspurger

Description

@TomAugspurger
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

The sample at

def take(self, indices, allow_fill=False, fill_value=None):
from pandas.core.algorithms import take
# If the ExtensionArray is backed by an ndarray, then
# just pass that here instead of coercing to object.
data = self.astype(object)
if allow_fill and fill_value is None:
fill_value = self.dtype.na_value
# fill value should always be translated from the scalar
# type for the array, to the physical storage type for
# the data, before passing to take.
result = take(data, indices, fill_value=fill_value,
allow_fill=allow_fill)
return self._from_sequence(result, dtype=self.dtype)
is buggy since it doesn't handle scalars properly

class MyDtype(ExtensionDtype):
    name = "name"

class MyArray(ExtensionArray):
    dtype = MyDtype()
    def __init__(self, data):
        self._data = data
        
    @classmethod
    def _from_sequence(cls, scalars, *, dtype=None, copy=False):
        return cls(np.array(scalars))
    
    def __getitem__(self, item):
        return self._data[item]
    
    def __len__(self):
        return len(self._data)
    def take(self, indices, allow_fill=False, fill_value=None):
        from pandas.core.algorithms import take
        # If the ExtensionArray is backed by an ndarray, then
        # just pass that here instead of coercing to object.
        data = self.astype(object)
        if allow_fill and fill_value is None:
            fill_value = self.dtype.na_value
        # fill value should always be translated from the scalar
        # type for the array, to the physical storage type for
        # the data, before passing to take.
        result = take(data, indices, fill_value=fill_value,
                      allow_fill=allow_fill)
        return self._from_sequence(result, dtype=self.dtype)

a = MyArray._from_sequence([1, 2, 3])
result = a.take(0)
assert result == 1

Problem description

Expected Output

.take(0) should return the scalar 1, rather than trying to wrap it in a new MyArray.

Output of pd.show_versions()

[paste the output of pd.show_versions() here leaving a blank line after the details tag]

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocsExtensionArrayExtending pandas with custom dtypes or arrays.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions