-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Closed
Labels
AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffEnhancementMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Description
In[2]: import pandas as pd
...: import numpy as np
...: pd.__version__
Out[2]: u'0.23.4'
In[3]: ts = pd.Series([np.nan, 1., 2., 3., np.nan, 4., np.nan])
In[4]: ts.pct_change(fill_method = None)
Out[4]:
0 NaN
1 NaN
2 1.0
3 0.5
4 NaN
5 NaN
6 NaN
dtype: float64
In[5]: ts.pct_change(fill_method = 'pad')
Out[5]:
0 NaN
1 NaN
2 1.000000
3 0.500000
4 0.000000
5 0.333333
6 0.000000
dtype: float64
In[6]: ts.pct_change(fill_method = 'pad').mask(ts.isnull())
Out[6]:
0 NaN
1 NaN
2 1.000000
3 0.500000
4 NaN
5 0.333333
6 NaN
dtype: float64
Hello,
After recently updating my version, I noticed a change in behavior of pct_change with missing data. This is related to #19873 .
First example without fill_method is as expected. The second example is the result now and the third is what it used to be. I think the user should be able to choose if she prefers the second or third behavior. I agree that the second example is correct, as it forward fills as expected, but if the time series is a stock price for example, returns on missing days (holidays) were not 0, which can bias some statistics.
I would suggest adding a new parameter, like skipna. I could not find any solution with existing parameters, if I missed something please let me know.
Thanks
Metadata
Metadata
Assignees
Labels
AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffEnhancementMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate