-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Open
Labels
DocsNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further actioncutcut, qcutcut, qcut
Description
Code Sample
import pandas as pd
import numpy as np
pd.cut(np.array([1, 7, 5, 4, 6, 3]), bins=[0, 3, 6, 8], include_lowest=True)
Problem description
Just by setting the include_lowest
to True
the data type of the interval changes from int64
to float64
and the first interval isn't left-inclusive. Here is the wrong output that you'll get:
[(-0.001, 3.0], (6.0, 8.0], (3.0, 6.0], (3.0, 6.0], (3.0, 6.0], (-0.001, 3.0]]
Categories (3, interval[float64]): [(-0.001, 3.0] < (3.0, 6.0] < (6.0, 8.0]]
Expected Output
[(0, 3], (6, 8], (3, 6], (3, 6], (3, 6], (0, 3]]
Categories (3, interval[int64]): [[0, 3] < (3, 6] < (6, 8]]
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: 3.8.2
pip: 10.0.1
setuptools: 40.4.3
Cython: 0.28.5
numpy: 1.15.2
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.0.1
sphinx: 1.8.1
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.0
openpyxl: 2.5.8
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.1.1
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.12
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
bluenote10, Hungreeee and marco-luzzara
Metadata
Metadata
Labels
DocsNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further actioncutcut, qcutcut, qcut