-
Notifications
You must be signed in to change notification settings - Fork 0
Description
The codebases uses fi_method variables at many places with different interpretations. Some use the existing FIMEthodclass
class FIMethod(StrEnum):
"""Feature importance computation methods."""
INTERNAL = "internal"
PERMUTATION = "permutation"
SHAP = "shap"
LOFO = "lofo"
CONSTANT = "constant"
COUNTS = "counts"
COUNTS_RELATIVE = "counts_relative"
other use strings and literals. This is inconsistant and error prone.
FI methods are used at different levels
ModelConfig.feature_method— Which FI method to use for constrained HPO feature-used calculation
Values: "internal", "permutation", "shap", "constant"
Used in: training.py
Scope: Pure internal / model registration
Octo.fi_methods_bestbag— Which FI to compute on the best bag for feature selection
Values: ["permutation", "shap", "constant"]
Used in: module.py:71, core.py:486, and Rfe2 overrides this
User-facing: Yes (Octo config parameter)
Mrmr.feature_importance_method— Which prior FI to read from upstream task results
Values: ["permutation", "shap", "internal", "lofo"]
Used in: module.py:53, core.py:79
User-facing: Yes (MRMR config parameter)
Rfe2.fi_method_rfe— Which FI to use for RFE elimination steps
Values: ["permutation", "shap"]
Used in: module.py:29, core.py:100
User-facing: Yes (RFE2 config parameter)
TaskPredictor.calculate_fi(fi_type=...) / TaskPredictorTest.calculate_fi(fi_type=...)— User API for post-hoc FI
Values: "permutation", "group_permutation", "shap"
Used in: task_predictor.py:370, task_predictor_test.py:210
User-facing: Yes (public API)
- Internal storage keys —
training.feature_importancesdict keys
Values: "internal", "constant", "permutation_dev", "permutation_test", "shap_dev", "shap_test", "lofo_dev", "lofo_test"
These are composite keys "{method}_{partition}" parsed by _parse_fi_key() in bag.py:29
Scope: Pure internal, never user-visible
Plus:
shap_typeparameter:"kernel", "permutation", "exact"— in bothtraining.py:760andfeature_importance.py:242FIMethodenum (already exists): Used for DataFramefi_methodcolumn standardization in result parquets
Problem
ModelConfig.feature_methodanswers: "How does this model natively compute which features it used?" — This is a model capability property.fi_methods_bestbag / fi_method_rfe / feature_importance_methodanswer: "Which FI calculation should we run on a Bag?" — This is a computation request.TaskPredictor.calculate_fi(fi_type=...)answers: "What kind of post-hoc explanation do you want?" — This includes "group_permutation" which doesn't exist in any other context.shap_typeanswers: "Which SHAP explainer implementation?" — This is orthogonal to all the above.- Storage keys and
FIMethodDataFrame labels are a serialization/standardization concern. - A single enum would force unrelated concepts together. For example, "group_permutation" makes no sense in ModelConfig.feature_method, and "constant" makes no sense in TaskPredictor.calculate_fi().
Proposed solution
Domain specific enums:
class FIResultLabel(StrEnum):
"""Standardized labels for the ``fi_method`` column in feature importance DataFrames.
These values are written to the ``fi_method`` column of
``feature_importances.parquet`` files when modules save results to disk.
They serve as a serialization/output concern — identifying *how* a stored
importance value was produced.
This is a superset of :class:`FIComputeMethod` plus EFS-specific labels
(``COUNTS``, ``COUNTS_RELATIVE``) that only appear in EFS result output.
"""
INTERNAL = "internal"
PERMUTATION = "permutation"
SHAP = "shap"
LOFO = "lofo"
CONSTANT = "constant"
COUNTS = "counts"
COUNTS_RELATIVE = "counts_relative"
class FIComputeMethod(StrEnum):
"""Feature importance computation methods.
Specifies *which* algorithm to use when calculating feature importances.
Used in three contexts:
- **Model registration** — ``ModelConfig.feature_method``: declares which
FI method a model natively supports for constrained HPO.
- **Module configuration** — ``Octo.fi_methods_bestbag``,
``Rfe2.fi_method_rfe``, ``Mrmr.feature_importance_method``: user-facing
parameters that control which FI computations to run.
- **Bag orchestration** — ``Bag.calculate_feature_importances()``,
``Bag.get_selected_features()``: internal dispatch to the appropriate
FI calculation on training objects.
Each usage site restricts the accepted subset via attrs validators, e.g.
``Rfe2.fi_method_rfe`` only accepts ``PERMUTATION`` and ``SHAP``.
"""
INTERNAL = "internal"
PERMUTATION = "permutation"
SHAP = "shap"
LOFO = "lofo"
CONSTANT = "constant"
class FIType(StrEnum):
"""Feature importance analysis types for the public prediction API.
Used as the ``fi_type`` parameter in :meth:`TaskPredictor.calculate_fi`
and :meth:`TaskPredictorTest.calculate_fi` to select the kind of
post-hoc feature importance analysis to perform on new or held-out data.
Unlike :class:`FIComputeMethod` (which operates within the training
pipeline on inner-CV folds), these run *after* a study is complete,
using the saved outer-split models.
"""
PERMUTATION = "permutation"
GROUP_PERMUTATION = "group_permutation"
SHAP = "shap"
class ShapExplainerType(StrEnum):
"""SHAP explainer implementations.
Selects which ``shap`` library explainer to use when computing
SHAP-based feature importances. Used as the ``shap_type`` parameter
in both the training pipeline (``Training.calculate_fi_shap``) and
the prediction API (``TaskPredictor.calculate_fi(fi_type='shap', shap_type=...)``).
- ``KERNEL`` — Model-agnostic :class:`shap.KernelExplainer`. Works with
any model but is the slowest.
- ``PERMUTATION`` — Model-agnostic :class:`shap.PermutationExplainer`.
Faster than kernel for many models.
- ``EXACT`` — :class:`shap.ExactExplainer`. Computes exact SHAP values;
slowest but most accurate.
Note: ``PERMUTATION`` here refers to the SHAP permutation explainer
algorithm, not to permutation feature importance (:class:`FIComputeMethod`).
"""
KERNEL = "kernel"
PERMUTATION = "permutation"
EXACT = "exact"
class FIDataset(StrEnum):
"""Dataset partitions for feature importance computation."""
TRAIN = "train"
DEV = "dev"
TEST = "test"