Replace Feature Importance Method String Literals with Domain-Specific Enums

The codebases uses `fi_method` variables at many places with different interpretations. Some use the existing `FIMEthod`class

```
class FIMethod(StrEnum):
    """Feature importance computation methods."""

    INTERNAL = "internal"
    PERMUTATION = "permutation"
    SHAP = "shap"
    LOFO = "lofo"
    CONSTANT = "constant"
    COUNTS = "counts"
    COUNTS_RELATIVE = "counts_relative"
```

other use strings and literals. This is inconsistant and error prone.

**FI methods are used at different levels**

1. `ModelConfig.feature_method` — Which FI method to use for constrained HPO feature-used calculation

Values: `"internal", "permutation", "shap", "constant"`
Used in: `training.py`
Scope: Pure internal / model registration

2. `Octo.fi_methods_bestbag` — Which FI to compute on the best bag for feature selection

Values: `["permutation", "shap", "constant"]`
Used in: `module.py:71, core.py:486`, and Rfe2 overrides this
User-facing: Yes (Octo config parameter)

3. `Mrmr.feature_importance_method` — Which prior FI to read from upstream task results

Values: `["permutation", "shap", "internal", "lofo"]`
Used in: `module.py:53, core.py:79`
User-facing: Yes (MRMR config parameter)

4. `Rfe2.fi_method_rfe` — Which FI to use for RFE elimination steps

Values: `["permutation", "shap"]`
Used in: `module.py:29, core.py:100`
User-facing: Yes (RFE2 config parameter)

5. `TaskPredictor.calculate_fi(fi_type=...) / TaskPredictorTest.calculate_fi(fi_type=...)` — User API for post-hoc FI

Values: `"permutation", "group_permutation", "shap"`
Used in: `task_predictor.py:370, task_predictor_test.py:210`
User-facing: Yes (public API)

6. Internal storage keys — `training.feature_importances` dict keys

Values: `"internal", "constant", "permutation_dev", "permutation_test", "shap_dev", "shap_test", "lofo_dev", "lofo_test"`
These are composite keys `"{method}_{partition}" `parsed by `_parse_fi_key()` in `bag.py:29`
Scope: Pure internal, never user-visible

Plus:

- `shap_type` parameter: `"kernel", "permutation", "exact"` — in both `training.py:760` and `feature_importance.py:242`
- `FIMethod` enum (already exists): Used for DataFrame `fi_method` column standardization in result parquets

**Problem**

- `ModelConfig.feature_method` answers: "How does this model natively compute which features it used?" — This is a model capability property.
- `fi_methods_bestbag / fi_method_rfe / feature_importance_method` answer: "Which FI calculation should we run on a Bag?" — This is a computation request.
- `TaskPredictor.calculate_fi(fi_type=...)` answers: "What kind of post-hoc explanation do you want?" — This includes "group_permutation" which doesn't exist in any other context.
- `shap_type` answers: "Which SHAP explainer implementation?" — This is orthogonal to all the above.
- Storage keys and `FIMethod` DataFrame labels are a serialization/standardization concern.
- A single enum would force unrelated concepts together. For example, "group_permutation" makes no sense in ModelConfig.feature_method, and "constant" makes no sense in TaskPredictor.calculate_fi().

**Proposed solution**
Domain specific enums:

```
class FIResultLabel(StrEnum):
    """Standardized labels for the ``fi_method`` column in feature importance DataFrames.

    These values are written to the ``fi_method`` column of
    ``feature_importances.parquet`` files when modules save results to disk.
    They serve as a serialization/output concern — identifying *how* a stored
    importance value was produced.

    This is a superset of :class:`FIComputeMethod` plus EFS-specific labels
    (``COUNTS``, ``COUNTS_RELATIVE``) that only appear in EFS result output.
    """

    INTERNAL = "internal"
    PERMUTATION = "permutation"
    SHAP = "shap"
    LOFO = "lofo"
    CONSTANT = "constant"
    COUNTS = "counts"
    COUNTS_RELATIVE = "counts_relative"


class FIComputeMethod(StrEnum):
    """Feature importance computation methods.

    Specifies *which* algorithm to use when calculating feature importances.
    Used in three contexts:

    - **Model registration** — ``ModelConfig.feature_method``: declares which
      FI method a model natively supports for constrained HPO.
    - **Module configuration** — ``Octo.fi_methods_bestbag``,
      ``Rfe2.fi_method_rfe``, ``Mrmr.feature_importance_method``: user-facing
      parameters that control which FI computations to run.
    - **Bag orchestration** — ``Bag.calculate_feature_importances()``,
      ``Bag.get_selected_features()``: internal dispatch to the appropriate
      FI calculation on training objects.

    Each usage site restricts the accepted subset via attrs validators, e.g.
    ``Rfe2.fi_method_rfe`` only accepts ``PERMUTATION`` and ``SHAP``.
    """

    INTERNAL = "internal"
    PERMUTATION = "permutation"
    SHAP = "shap"
    LOFO = "lofo"
    CONSTANT = "constant"


class FIType(StrEnum):
    """Feature importance analysis types for the public prediction API.

    Used as the ``fi_type`` parameter in :meth:`TaskPredictor.calculate_fi`
    and :meth:`TaskPredictorTest.calculate_fi` to select the kind of
    post-hoc feature importance analysis to perform on new or held-out data.

    Unlike :class:`FIComputeMethod` (which operates within the training
    pipeline on inner-CV folds), these run *after* a study is complete,
    using the saved outer-split models.
    """

    PERMUTATION = "permutation"
    GROUP_PERMUTATION = "group_permutation"
    SHAP = "shap"


class ShapExplainerType(StrEnum):
    """SHAP explainer implementations.

    Selects which ``shap`` library explainer to use when computing
    SHAP-based feature importances. Used as the ``shap_type`` parameter
    in both the training pipeline (``Training.calculate_fi_shap``) and
    the prediction API (``TaskPredictor.calculate_fi(fi_type='shap', shap_type=...)``).

    - ``KERNEL`` — Model-agnostic :class:`shap.KernelExplainer`. Works with
      any model but is the slowest.
    - ``PERMUTATION`` — Model-agnostic :class:`shap.PermutationExplainer`.
      Faster than kernel for many models.
    - ``EXACT`` — :class:`shap.ExactExplainer`. Computes exact SHAP values;
      slowest but most accurate.

    Note: ``PERMUTATION`` here refers to the SHAP permutation explainer
    algorithm, not to permutation feature importance (:class:`FIComputeMethod`).
    """

    KERNEL = "kernel"
    PERMUTATION = "permutation"
    EXACT = "exact"


class FIDataset(StrEnum):
    """Dataset partitions for feature importance computation."""

    TRAIN = "train"
    DEV = "dev"
    TEST = "test"
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace Feature Importance Method String Literals with Domain-Specific Enums #350

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Replace Feature Importance Method String Literals with Domain-Specific Enums #350

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions