Skip to content

BUG/Internals: maybe_promote #23833

@h-vetinari

Description

@h-vetinari

Seems I found a pretty deep rabbit hole while trying to solve #23823 (while trying to solve #23192 / #23604):

maybe_upcast_putmask and maybe_promote are both completely untested (or at least, their names do not appear anywhere in pandas/tests/), and maybe_promote also does not have a docstring. Side note: ran into a segfault while trying to remove some old numpy compat code from that method in #23796.

Aside from missing a docstring and tests, the behaviour is also false, at least regarding integer types:

>>> import numpy as np
>>> from pandas.core.dtypes.cast import maybe_promote
>>> maybe_promote(np.dtype('int8'), np.array([10, np.iinfo('int8').max + 1, 12]))
(<class 'numpy.float64'>, nan)

To me, this should clearly upcast to int16 instead of float (using arrays for fill_value is correct usage, as done e.g. in maybe_upcast_putmask as maybe_promote(result.dtype, other), and has a dedicated code branch in maybe_promote).

In int-to-int promotion, the question is what to return as an actual fill_value though. Of course, this method is being used in pretty central code paths, but the number of uses is not that high (on master; half of the instances are imports/redefinitions).

pandas/core\algorithms.py:12:    maybe_promote, construct_1d_object_array_from_listlike)
pandas/core\algorithms.py:1572:        _maybe_promote to determine this type for any fill_value
pandas/core\algorithms.py:1617:            dtype, fill_value = maybe_promote(arr.dtype, fill_value)
pandas/core\algorithms.py:1700:            dtype, fill_value = maybe_promote(arr.dtype, fill_value)
pandas/core\dtypes\cast.py:228:        new_dtype, _ = maybe_promote(result.dtype, other)
pandas/core\dtypes\cast.py:252:def maybe_promote(dtype, fill_value=np.nan):
pandas/core\dtypes\cast.py:538:        new_dtype, fill_value = maybe_promote(dtype, fill_value)
pandas/core\generic.py:34:from pandas.core.dtypes.cast import maybe_promote, maybe_upcast_putmask
pandas/core\generic.py:8289:                            dtype, fill_value = maybe_promote(other.dtype)
pandas/core\indexes\base.py:3371:        pself, ptarget = self._maybe_promote(target)
pandas/core\indexes\base.py:3505:        pself, ptarget = self._maybe_promote(target)
pandas/core\indexes\base.py:3528:    def _maybe_promote(self, other):
pandas/core\indexes\datetimes.py:924:    def _maybe_promote(self, other):
pandas/core\indexes\timedeltas.py:409:    def _maybe_promote(self, other):
pandas/core\internals\blocks.py:45:    maybe_promote,
pandas/core\internals\blocks.py:899:            dtype, _ = maybe_promote(arr_value.dtype)
pandas/core\internals\blocks.py:1054:                    dtype, _ = maybe_promote(n.dtype)
pandas/core\internals\blocks.py:3174:        dtype, fill_value = maybe_promote(values.dtype)
pandas/core\internals\blocks.py:3293:    dtype, _ = maybe_promote(n.dtype)
pandas/core\internals\concat.py:19:from pandas.core.dtypes.cast import maybe_promote
pandas/core\internals\concat.py:137:            return _get_dtype(maybe_promote(self.block.dtype,
pandas/core\internals\managers.py:22:    maybe_promote,
pandas/core\internals\managers.py:1277:                    _, fill_value = maybe_promote(blk.dtype)
pandas/core\reshape\reshape.py:12:from pandas.core.dtypes.cast import maybe_promote
pandas/core\reshape\reshape.py:192:            dtype, fill_value = maybe_promote(values.dtype, self.fill_value)

Therefore it might make sense to adapt the private API, e.g. adding a kwarg must_hold_na and/or return_default_na. I've inspected all the occurrences of the code above, and this would not be a problem to implement.

Once I get around to it, will probably split this into two PRs, one just for adding tests/docstring, and one to change...

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions