Skip to content

Refactor Feature Importances, fix shap multiclass, fix T2E FIs#362

Open
anwurl wants to merge 12 commits intomainfrom
refactor/fi
Open

Refactor Feature Importances, fix shap multiclass, fix T2E FIs#362
anwurl wants to merge 12 commits intomainfrom
refactor/fi

Conversation

@anwurl
Copy link
Collaborator

@anwurl anwurl commented Mar 12, 2026

  • main focus: centralize and remove duplications in the feature importance code
  • a refactoring and simplification of octo will be done later
  • make shap work for multi-class
  • make FI work for T2E
  • update catboost, shap

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors feature-importance implementations by centralizing shared computation logic into octopus.predict.feature_importance “Layer 1” primitives, then reusing those primitives from both training and predict code paths to reduce duplication.

Changes:

  • Added shared FI primitives (compute_per_repeat_stats, compute_internal_fi, compute_permutation_single, compute_shap_single) and refactored predict FI orchestrators to use them.
  • Refactored Training permutation/internal/SHAP FI methods to delegate into the shared primitives; removed the legacy calculate_fi_group_permutation entry points.
  • Updated bag + tests to call the unified permutation FI implementation and added a no-groups test variant.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File Description
tests/modules/test_training_feature_importances.py Updates FI method coverage to use unified permutation method and adds a “no groups” permutation variant.
octopus/predict/feature_importance.py Introduces shared FI primitives and refactors predict FI orchestrators to use them.
octopus/modules/octo/training.py Removes duplicated FI logic and delegates internal/permutation/SHAP FI to shared primitives.
octopus/modules/octo/bag.py Switches from the removed group-permutation method to the unified permutation method.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@anwurl anwurl changed the title Refactor feature importances Refactor Feature Importances and fix shap multiclass Mar 13, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +464 to 466
# Build target_assignments from target_col
target_assignments = {target_col: target_col}

Comment on lines +155 to +160
# KNOWN ISSUE: CatBoost multiclass is disabled because shap.Explainer(model, bg)
# segfaults in SHAP <=0.51 when using TreeExplainer's interventional mode with
# CatBoost multiclass models. Re-enable once SHAP fixes this upstream.
# See datasets_local/specifications_refactorfi/03_shap_catboost_segfault_proposal.md
# Original: ml_types=[MLType.BINARY, MLType.MULTICLASS],
ml_types=[MLType.BINARY],
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +282 to 286
training = _create_training_instance(
data_train, data_dev, data_test, ml_type, model_name, feature_cols, feature_groups
)
training.fit()

Comment on lines +155 to +160
# KNOWN ISSUE: CatBoost multiclass is disabled because shap.Explainer(model, bg)
# segfaults in SHAP <=0.51 when using TreeExplainer's interventional mode with
# CatBoost multiclass models. Re-enable once SHAP fixes this upstream.
# See datasets_local/specifications_refactorfi/03_shap_catboost_segfault_proposal.md
# Original: ml_types=[MLType.BINARY, MLType.MULTICLASS],
ml_types=[MLType.BINARY],
@anwurl anwurl changed the title Refactor Feature Importances and fix shap multiclass Refactor Feature Importances, fix shap multiclass, fix T2E FIs Mar 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants