Skip to content

feat(skore): Add cross-validation support for permutation importance#2370

Open
glemaitre wants to merge 11 commits intoprobabl-ai:mainfrom
glemaitre:is/1780_cross_validation
Open

feat(skore): Add cross-validation support for permutation importance#2370
glemaitre wants to merge 11 commits intoprobabl-ai:mainfrom
glemaitre:is/1780_cross_validation

Conversation

@glemaitre
Copy link
Member

Partially addressing #1780

This PR adding support for CrossValidationReport in the PermutationImportanceDisplay.

@glemaitre glemaitre marked this pull request as draft February 4, 2026 23:00
@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

Documentation preview @ 6d0de91

@glemaitre glemaitre added this to Labs Feb 9, 2026
@github-project-automation github-project-automation bot moved this to Todo in Labs Feb 9, 2026
@glemaitre glemaitre moved this from Todo to In progress - High Priority in Labs Feb 9, 2026
@auguste-probabl auguste-probabl moved this from In progress - High Priority to In progress in Labs Feb 9, 2026
@glemaitre glemaitre marked this pull request as ready for review February 9, 2026 23:06
@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

Coverage

Coverage Report for skore/
FileStmtsMissCoverMissing
skore/src/skore
   __init__.py280100% 
   _config.py310100% 
   _login.py140100% 
   exceptions.py440%4, 15, 19, 23
skore/src/skore/_sklearn
   __init__.py60100% 
   _base.py73889%272–279
   feature_names.py260100% 
   find_ml_task.py610100% 
   types.py28196%30
skore/src/skore/_sklearn/_comparison
   __init__.py70100% 
   inspection_accessor.py27196%124
   metrics_accessor.py166298%179, 1138
   report.py111397%488, 491, 497
   utils.py570100% 
skore/src/skore/_sklearn/_cross_validation
   __init__.py90100% 
   data_accessor.py400100% 
   inspection_accessor.py35197%161
   metrics_accessor.py169597%1054, 1115, 1118, 1144–1145
   report.py126397%498, 501, 507
skore/src/skore/_sklearn/_estimator
   __init__.py90100% 
   data_accessor.py63296%86, 190
   inspection_accessor.py69297%314, 328
   metrics_accessor.py367698%427, 431, 446, 476, 1661, 1744
   report.py158298%463, 466
skore/src/skore/_sklearn/_plot
   __init__.py30100% 
   base.py57296%59–60
   utils.py121298%273–274
skore/src/skore/_sklearn/_plot/data
   __init__.py20100% 
   table_report.py175199%657
skore/src/skore/_sklearn/_plot/inspection
   __init__.py00100% 
   coefficients.py2050100% 
   impurity_decrease.py61198%186
   permutation_importance.py146298%283–284
   utils.py90100% 
skore/src/skore/_sklearn/_plot/metrics
   __init__.py60100% 
   confusion_matrix.py1560100% 
   metrics_summary_display.py80100% 
   precision_recall_curve.py1080100% 
   prediction_error.py1510100% 
   roc_curve.py1120100% 
skore/src/skore/_sklearn/train_test_split
   __init__.py00100% 
   train_test_split.py580100% 
skore/src/skore/_sklearn/train_test_split/warning
   __init__.py80100% 
   high_class_imbalance_too_few_examples_warning.py19194%83
   high_class_imbalance_warning.py200100% 
   random_state_unset_warning.py100100% 
   shuffle_true_warning.py90100% 
   stratify_is_set_warning.py100100% 
   time_based_column_warning.py210100% 
   train_test_split_warning.py30100% 
skore/src/skore/_utils
   __init__.py6266%8, 13
   _accessor.py112397%38, 214, 268
   _cache.py370100% 
   _environment.py32293%41, 44
   _fixes.py80100% 
   _index.py50100% 
   _logger.py22481%15–17, 19
   _measure_time.py100100% 
   _parallel.py38392%23, 33, 124
   _patch.py211242%30, 35–39, 42–43, 46–47, 58, 60
   _progress_bar.py41490%55–56, 66–67
   _show_versions.py380100% 
   _testing.py1031189%20, 29, 157, 166, 177–182, 184
skore/src/skore/_utils/repr
   __init__.py20100% 
   base.py540100% 
   data.py1630100% 
   html_repr.py380100% 
   rich_repr.py810100% 
skore/src/skore/project
   __init__.py20100% 
   _summary.py75198%119
   _widget.py1870100% 
   project.py550100% 
TOTAL42929197% 

Tests Skipped Failures Errors Time
1657 5 💤 0 ❌ 0 🔥 6m 36s ⏱️

row, col, hue = (
subplot_cols[0],
subplot_cols[1],
(next(iter(remaining)) if remaining else None),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
(next(iter(remaining)) if remaining else None),
next(iter(remaining), None),

Comment on lines +271 to +281
case 1:
col, row, hue = (
subplot_cols[0],
None,
(next(iter(remaining)) if remaining else None),
)
col, row = subplot_by, None
if remaining_column := set(columns_to_groupby) - {subplot_by}:
hue = next(iter(remaining_column))
else:
hue = None
else:
if not all(item in columns_to_groupby for item in subplot_by):
case 2:
row, col, hue = (
subplot_cols[0],
subplot_cols[1],
(next(iter(remaining)) if remaining else None),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both row and hue are the same

Comment on lines +331 to +332
f"Permutation importance {aggregate_title} \nof {estimator_name} "
f"on {data_source} set"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"Permutation importance {aggregate_title} \nof {estimator_name} "
f"on {data_source} set"
f"Permutation importance {aggregate_title}\n"
f"of {estimator_name} on {data_source} set"

Copy link
Collaborator

@GaetandeCast GaetandeCast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some first remarks from a usage perspective.

Also, the shaded background is very faint. I propose increasing the alpha from 0.1 to 0.4.
Opened #2428 to do it separately.

Comment on lines +177 to +179
metric : str
Metric to plot.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's give it a None default and raise an error asking to choose a metric when multiple metrics were passed to the constructor. Its counter intuitive to have to choose a metric when I did not specify any in the constructor.

Comment on lines 370 to 372
if frame["label"].isna().all():
# regression problem or averaged classification metric
columns_to_drop.append("label")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using an averaged metric in multiclass classification, the label column is dropped here. Then, plotting with subplot_by="label" raises : "The column(s) ['label'] are not available." which was confusing to me.
Should we create a specific error that says something like You are using a metric averaged over labels, subplot_by="label" is not available to make it less confusing ?

@GaetandeCast
Copy link
Collaborator

Taking over as agreed with @glemaitre.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

3 participants