Skip to content

Replace Feature Importance Method String Literals with Domain-Specific Enums #350

@kalama-ai

Description

@kalama-ai

The codebases uses fi_method variables at many places with different interpretations. Some use the existing FIMEthodclass

class FIMethod(StrEnum):
    """Feature importance computation methods."""

    INTERNAL = "internal"
    PERMUTATION = "permutation"
    SHAP = "shap"
    LOFO = "lofo"
    CONSTANT = "constant"
    COUNTS = "counts"
    COUNTS_RELATIVE = "counts_relative"

other use strings and literals. This is inconsistant and error prone.

FI methods are used at different levels

  1. ModelConfig.feature_method — Which FI method to use for constrained HPO feature-used calculation

Values: "internal", "permutation", "shap", "constant"
Used in: training.py
Scope: Pure internal / model registration

  1. Octo.fi_methods_bestbag — Which FI to compute on the best bag for feature selection

Values: ["permutation", "shap", "constant"]
Used in: module.py:71, core.py:486, and Rfe2 overrides this
User-facing: Yes (Octo config parameter)

  1. Mrmr.feature_importance_method — Which prior FI to read from upstream task results

Values: ["permutation", "shap", "internal", "lofo"]
Used in: module.py:53, core.py:79
User-facing: Yes (MRMR config parameter)

  1. Rfe2.fi_method_rfe — Which FI to use for RFE elimination steps

Values: ["permutation", "shap"]
Used in: module.py:29, core.py:100
User-facing: Yes (RFE2 config parameter)

  1. TaskPredictor.calculate_fi(fi_type=...) / TaskPredictorTest.calculate_fi(fi_type=...) — User API for post-hoc FI

Values: "permutation", "group_permutation", "shap"
Used in: task_predictor.py:370, task_predictor_test.py:210
User-facing: Yes (public API)

  1. Internal storage keys — training.feature_importances dict keys

Values: "internal", "constant", "permutation_dev", "permutation_test", "shap_dev", "shap_test", "lofo_dev", "lofo_test"
These are composite keys "{method}_{partition}" parsed by _parse_fi_key() in bag.py:29
Scope: Pure internal, never user-visible

Plus:

  • shap_type parameter: "kernel", "permutation", "exact" — in both training.py:760 and feature_importance.py:242
  • FIMethod enum (already exists): Used for DataFrame fi_method column standardization in result parquets

Problem

  • ModelConfig.feature_method answers: "How does this model natively compute which features it used?" — This is a model capability property.
  • fi_methods_bestbag / fi_method_rfe / feature_importance_method answer: "Which FI calculation should we run on a Bag?" — This is a computation request.
  • TaskPredictor.calculate_fi(fi_type=...) answers: "What kind of post-hoc explanation do you want?" — This includes "group_permutation" which doesn't exist in any other context.
  • shap_type answers: "Which SHAP explainer implementation?" — This is orthogonal to all the above.
  • Storage keys and FIMethod DataFrame labels are a serialization/standardization concern.
  • A single enum would force unrelated concepts together. For example, "group_permutation" makes no sense in ModelConfig.feature_method, and "constant" makes no sense in TaskPredictor.calculate_fi().

Proposed solution
Domain specific enums:

class FIResultLabel(StrEnum):
    """Standardized labels for the ``fi_method`` column in feature importance DataFrames.

    These values are written to the ``fi_method`` column of
    ``feature_importances.parquet`` files when modules save results to disk.
    They serve as a serialization/output concern — identifying *how* a stored
    importance value was produced.

    This is a superset of :class:`FIComputeMethod` plus EFS-specific labels
    (``COUNTS``, ``COUNTS_RELATIVE``) that only appear in EFS result output.
    """

    INTERNAL = "internal"
    PERMUTATION = "permutation"
    SHAP = "shap"
    LOFO = "lofo"
    CONSTANT = "constant"
    COUNTS = "counts"
    COUNTS_RELATIVE = "counts_relative"


class FIComputeMethod(StrEnum):
    """Feature importance computation methods.

    Specifies *which* algorithm to use when calculating feature importances.
    Used in three contexts:

    - **Model registration** — ``ModelConfig.feature_method``: declares which
      FI method a model natively supports for constrained HPO.
    - **Module configuration** — ``Octo.fi_methods_bestbag``,
      ``Rfe2.fi_method_rfe``, ``Mrmr.feature_importance_method``: user-facing
      parameters that control which FI computations to run.
    - **Bag orchestration** — ``Bag.calculate_feature_importances()``,
      ``Bag.get_selected_features()``: internal dispatch to the appropriate
      FI calculation on training objects.

    Each usage site restricts the accepted subset via attrs validators, e.g.
    ``Rfe2.fi_method_rfe`` only accepts ``PERMUTATION`` and ``SHAP``.
    """

    INTERNAL = "internal"
    PERMUTATION = "permutation"
    SHAP = "shap"
    LOFO = "lofo"
    CONSTANT = "constant"


class FIType(StrEnum):
    """Feature importance analysis types for the public prediction API.

    Used as the ``fi_type`` parameter in :meth:`TaskPredictor.calculate_fi`
    and :meth:`TaskPredictorTest.calculate_fi` to select the kind of
    post-hoc feature importance analysis to perform on new or held-out data.

    Unlike :class:`FIComputeMethod` (which operates within the training
    pipeline on inner-CV folds), these run *after* a study is complete,
    using the saved outer-split models.
    """

    PERMUTATION = "permutation"
    GROUP_PERMUTATION = "group_permutation"
    SHAP = "shap"


class ShapExplainerType(StrEnum):
    """SHAP explainer implementations.

    Selects which ``shap`` library explainer to use when computing
    SHAP-based feature importances. Used as the ``shap_type`` parameter
    in both the training pipeline (``Training.calculate_fi_shap``) and
    the prediction API (``TaskPredictor.calculate_fi(fi_type='shap', shap_type=...)``).

    - ``KERNEL`` — Model-agnostic :class:`shap.KernelExplainer`. Works with
      any model but is the slowest.
    - ``PERMUTATION`` — Model-agnostic :class:`shap.PermutationExplainer`.
      Faster than kernel for many models.
    - ``EXACT`` — :class:`shap.ExactExplainer`. Computes exact SHAP values;
      slowest but most accurate.

    Note: ``PERMUTATION`` here refers to the SHAP permutation explainer
    algorithm, not to permutation feature importance (:class:`FIComputeMethod`).
    """

    KERNEL = "kernel"
    PERMUTATION = "permutation"
    EXACT = "exact"


class FIDataset(StrEnum):
    """Dataset partitions for feature importance computation."""

    TRAIN = "train"
    DEV = "dev"
    TEST = "test"

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions