Skip to content

Pandas indexing bug raises TypeError when slicing with categorical IntervalIndex #21068

@antipisa

Description

@antipisa

Pandas indexing should not rely on subnormal floats behavior inside categorical data. Please bit cast your floats to integers when computing categorical labels:

def _get_next_label(label):

The following is an example of integer slicing with floating point interval endpoints that should return the first slice of the table:

import pandas as pd
import numpy as np

t = pd.DataFrame(dict(sym=np.arange(2), y=1., z=-1.))
t.loc[:, 'x'] = pd.Series([pd.Interval(-1., 0.0, closed='right'), pd.Interval(0.0, 1, closed='right')])
t.set_index('x', inplace=True)
t.index = pd.Categorical(t.index)
t.loc[t.index.categories[0], :]

Out:
sym    0.0
y      1.0
z     -1.0
Name: (-1.0, 0.0], dtype: float64

However, this fails:

import daz
daz.set_ftz()
daz.set_daz()

t = pd.DataFrame(dict(sym=np.arange(2), y=1., z=-1.))
t.loc[:, 'x'] = pd.Series([pd.Interval(-1., 0.0, closed='right'), pd.Interval(0.0, 1, closed='right')])
t.set_index('x', inplace=True)
t.index = pd.Categorical(t.index)
t.loc[t.index.categories[0], :]
TypeError                                 Traceback (most recent call last)
<ipython-input-3-3a8fe3a302cf> in <module>()
      8 t.set_index('x', inplace=True)
      9 t.index = pd.Categorical(t.index)
---> 10 t.loc[t.index.categories[0], :]

/Users/bohun/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key)
   1365             except (KeyError, IndexError):
   1366                 pass
-> 1367             return self._getitem_tuple(key)
   1368         else:
   1369             # we by definition only have the 0th axis

/Users/bohun/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup)
    856     def _getitem_tuple(self, tup):
    857         try:
--> 858             return self._getitem_lowerdim(tup)
    859         except IndexingError:
    860             pass

/Users/bohun/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_lowerdim(self, tup)
    989         for i, key in enumerate(tup):
    990             if is_label_like(key) or isinstance(key, tuple):
--> 991                 section = self._getitem_axis(key, axis=i)
    992 
    993                 # we have yielded a scalar ?

/Users/bohun/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
   1625         # fall thru to straight lookup
   1626         self._has_valid_type(key, axis)
-> 1627         return self._get_label(key, axis=axis)
   1628 
   1629 

/Users/bohun/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.pyc in _get_label(self, label, axis)
    143             raise IndexingError('no slices here, handle elsewhere')
    144 
--> 145         return self.obj._xs(label, axis=axis)
    146 
    147     def _get_loc(self, key, axis=None):

/Users/bohun/anaconda2/lib/python2.7/site-packages/pandas/core/generic.pyc in xs(self, key, axis, level, drop_level)
   2342                                                       drop_level=drop_level)
   2343         else:
-> 2344             loc = self.index.get_loc(key)
   2345 
   2346             if isinstance(loc, np.ndarray):

/Users/bohun/anaconda2/lib/python2.7/site-packages/pandas/core/indexes/category.pyc in get_loc(self, key, method)
    410         if (codes == -1):
    411             raise KeyError(key)
--> 412         return self._engine.get_loc(codes)
    413 
    414     def get_value(self, series, key):

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

TypeError: 'slice(0, 2, None)' is an invalid key

since the default behavior for floating endpoints forces the interval index to be cast into an integer slice. This is not ideal.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCategoricalCategorical Data TypeIndexingRelated to indexing on series/frames, not to indexes themselvesIntervalInterval data type

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions