-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Code and output:
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'id' : ['a', np.NaN, 'c'],
'A_1999' : [1, 2, 3],
'A_2000' : [5, 6, 7]})
df1
id A_1999 A_2000
0 a 1 5
1 NaN 2 6
2 c 3 7
df2 = pd.wide_to_long(df1,
stubnames = ['A'],
sep = '_',
i = 'id',
j = 'year').reset_index()
df2
id year A
0 a 1999 1
1 a 1999 2
2 c 1999 3
3 a 2000 5
4 a 2000 6
5 c 2000 7
Problem description
df1 has one NaN in the id column. When applying wide_to_long
to df1, wide_to_long
seems to fill the NaNs from df1 with the preceding value from the df1 dataframe (namely, a).
Why might this be a problem: instances of id == 'a'
in df1 are not the same as in df2 (because the NaNs from df1 have been replaced by a in df2):
df1.loc[df1['id'] == 'a',]
id A_1999 A_2000
0 a 1 5
versus:
df2.loc[df2['id'] == 'a',]
id year A
0 a 1999 1
1 a 1999 2
3 a 2000 5
4 a 2000 6
Expected Output
I would expect wide_to_long
to either
- throw an error that the column that is passed to its
i
argument contains NaNs, or - preserve the NaN case in the second dataframe (not sure whether this is desirable), i.e.:
id year A
0 a 1999 1
1 NaN 1999 2
2 c 1999 3
3 a 2000 5
4 NaN 2000 6
5 c 2000 7
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200210
Cython : 0.29.15
pytest : 5.3.5
hypothesis : 5.4.1
sphinx : 1.8.5
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.43.1