Skip to content

ENH: sklearn.model_selection.ShuffleSplit compatible wit pandas.MultiIndex #60552

@jeiglsperger

Description

@jeiglsperger

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I wish I could smoothly apply sklearn.model_selection.ShuffleSplit with a pandas.Dataframe that has a pandas.MultiIndex. I tried to apply it, but it throws some weird KeyError.

Feature Description

This is my pseudo code how I would expect it to work:

from sklearn.model_selection import ShuffleSplit
from sktime.datatypes import get_examples
df = get_examples(mtype="pd-multiindex", as_scitype="Panel")[0]
splitter = ShuffleSplit(n_splits=3, random_state=42)
split = splitter.split(df.index.levels[0]) 
train_indexes = []
test_indexes = []
for train_index, test_index in split:
    train_indexes.append(train_index)
    test_indexes.append(test_index)
x_train, x_test = (df.loc[train_indexes[0]], df.loc[test_indexes[0]])

Alternative Solutions

Currently I only to a train_test_split as described here without any cross validation.

Additional Context

It would also be nice to extend this to the other split methods of sklearn.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions