-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
datetime_column = "datetime"
def sum_of_subtraction(values):
col_1 = values[:, 0]
col_2 = values[:, 1]
return np.mean(col_1 - col_2)
cols_order_1 = ["a", "b"]
cols_order_2 = ["b", "a"]
df = pd.DataFrame(
{
datetime_column: pd.date_range("2020-01-01", periods=6),
"a": [1, 2, 3, 4, 5, 6],
"b": [6, 7, 8, 5, 6, 7],
}
)
order_1_df = (
df
.rolling('3D', on=datetime_column, method='table')[cols_order_1]
.apply(sum_of_subtraction, engine="numba", raw=True)
)
order_2_df = (
df
.rolling('3D', on=datetime_column, method='table')[cols_order_2]
.apply(sum_of_subtraction, engine="numba", raw=True)
)
Issue Description
With the code above, the function sum_of_subtraction
returns a different results if the columns of the input argument values
are swapped (cf the example below).
x = np.array([[1, 2],[3, 4]])
y = x[:, [1, 0]] # columns are swapped
print(sum_of_subtraction(x)) # returns -1.0
print(sum_of_subtraction(y)) # returns 1.0
Yet, using rolling
with method='table'
followed with apply
returns same results, no matter or the columns ordering. Even though the columns were given with a specific ordering, when calling apply
with engine='numba'
, the columns are sorted alphabetically. With the code above, here are the results:
order_1_df
:
a | b | datetime | |
---|---|---|---|
0 | -5 | -5 | 2020-01-01 00:00:00 |
1 | -5 | -5 | 2020-01-02 00:00:00 |
2 | -5 | -5 | 2020-01-03 00:00:00 |
3 | -3.66667 | -3.66667 | 2020-01-04 00:00:00 |
4 | -2.33333 | -2.33333 | 2020-01-05 00:00:00 |
5 | -1 | -1 | 2020-01-06 00:00:00 |
order_2_df
:
a | b | datetime | |
---|---|---|---|
0 | -5 | -5 | 2020-01-01 00:00:00 |
1 | -5 | -5 | 2020-01-02 00:00:00 |
2 | -5 | -5 | 2020-01-03 00:00:00 |
3 | -3.66667 | -3.66667 | 2020-01-04 00:00:00 |
4 | -2.33333 | -2.33333 | 2020-01-05 00:00:00 |
5 | -1 | -1 | 2020-01-06 00:00:00 |
Expected Behavior
The results when the columns are ordering as such ["b", "a"]
must be like that (order_2_df
dataframe):
b | a | datetime | |
---|---|---|---|
0 | 5 | 5 | 2020-01-01 00:00:00 |
1 | 5 | 5 | 2020-01-02 00:00:00 |
2 | 5 | 5 | 2020-01-03 00:00:00 |
3 | 3.66667 | 3.66667 | 2020-01-04 00:00:00 |
4 | 2.33333 | 2.33333 | 2020-01-05 00:00:00 |
5 | 1 | 1 | 2020-01-06 00:00:00 |
Installed Versions
INSTALLED VERSIONS
commit : e4956ab
python : 3.10.14
python-bits : 64
OS : Linux
OS-release : 5.15.0-1066-azure
Version : # 75-Ubuntu SMP Thu May 30 14:29:45 UTC 2024
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 3.0.0.dev0+1413.ge4956ab403
numpy : 2.0.2
dateutil : 2.9.0
pip : 24.2
Cython : None
sphinx : None
IPython : 8.26.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
blosc : None
bottleneck : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : None
numba : 0.60.0
numexpr : None
odfpy : None
openpyxl : None
psycopg2 : None
pymysql : None
pyarrow : None
pyreadstat : None
pytest : None
python-calamine : None
pytz : 2024.1
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None