Skip to content

BUG: rolling with method='table' and apply sorts columns alphabetically #59666

@valeju

Description

@valeju

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

datetime_column = "datetime"

def sum_of_subtraction(values):
    col_1 = values[:, 0]
    col_2 = values[:, 1]
    return np.mean(col_1 - col_2)


cols_order_1 = ["a", "b"]
cols_order_2 = ["b", "a"]

df = pd.DataFrame(
    {
        datetime_column: pd.date_range("2020-01-01", periods=6),
        "a": [1, 2, 3, 4, 5, 6],
        "b": [6, 7, 8, 5, 6, 7],
    }
)
order_1_df = (
    df
    .rolling('3D', on=datetime_column, method='table')[cols_order_1]
    .apply(sum_of_subtraction, engine="numba", raw=True)
)

order_2_df = (
    df
    .rolling('3D', on=datetime_column, method='table')[cols_order_2]
    .apply(sum_of_subtraction, engine="numba", raw=True)
)

Issue Description

With the code above, the function sum_of_subtraction returns a different results if the columns of the input argument values are swapped (cf the example below).

x = np.array([[1, 2],[3, 4]])
y = x[:, [1, 0]]  # columns are swapped
print(sum_of_subtraction(x))  # returns -1.0
print(sum_of_subtraction(y))  # returns 1.0

Yet, using rolling with method='table' followed with apply returns same results, no matter or the columns ordering. Even though the columns were given with a specific ordering, when calling apply with engine='numba', the columns are sorted alphabetically. With the code above, here are the results:

order_1_df:

a b datetime
0 -5 -5 2020-01-01 00:00:00
1 -5 -5 2020-01-02 00:00:00
2 -5 -5 2020-01-03 00:00:00
3 -3.66667 -3.66667 2020-01-04 00:00:00
4 -2.33333 -2.33333 2020-01-05 00:00:00
5 -1 -1 2020-01-06 00:00:00

order_2_df:

a b datetime
0 -5 -5 2020-01-01 00:00:00
1 -5 -5 2020-01-02 00:00:00
2 -5 -5 2020-01-03 00:00:00
3 -3.66667 -3.66667 2020-01-04 00:00:00
4 -2.33333 -2.33333 2020-01-05 00:00:00
5 -1 -1 2020-01-06 00:00:00

Expected Behavior

The results when the columns are ordering as such ["b", "a"] must be like that (order_2_df dataframe):

b a datetime
0 5 5 2020-01-01 00:00:00
1 5 5 2020-01-02 00:00:00
2 5 5 2020-01-03 00:00:00
3 3.66667 3.66667 2020-01-04 00:00:00
4 2.33333 2.33333 2020-01-05 00:00:00
5 1 1 2020-01-06 00:00:00

Installed Versions

INSTALLED VERSIONS

commit : e4956ab
python : 3.10.14
python-bits : 64
OS : Linux
OS-release : 5.15.0-1066-azure
Version : # 75-Ubuntu SMP Thu May 30 14:29:45 UTC 2024
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 3.0.0.dev0+1413.ge4956ab403
numpy : 2.0.2
dateutil : 2.9.0
pip : 24.2
Cython : None
sphinx : None
IPython : 8.26.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
blosc : None
bottleneck : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : None
numba : 0.60.0
numexpr : None
odfpy : None
openpyxl : None
psycopg2 : None
pymysql : None
pyarrow : None
pyreadstat : None
pytest : None
python-calamine : None
pytz : 2024.1
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapBugWindowrolling, ewma, expanding

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions