Skip to content

Commit f8b5446

Browse files
antoinebakerjpaillardlionelkuschAngelReyerobthirion
authored
[DOC] User guide section 3. model-agnostic methods: CFI (#402)
* first commit * add section structure * [doc quick] [skip tests] skip * missing .rst? [doc quick] [skip tests] * fix link [doc quick] [skip tests] * add CFI fiirst draft and TSI [doc quick] [skip tests] * missing space? [doc quick] [skip tests] * replace space * [doc quick] [skip tests] * add ref and note section [doc quick] [skip tests] * add code snippets * typo cfi [doc quick] [skip tests] * add total sobol index ref [doc quick] [skip tests] * add copy button [doc quick] [skip tests] * missing sphinx requirements [quick doc] [skip tests] * add copybutton config * [doc quick] [skip tests] * solve example test * clarify "sub-model" for classif and regression * trry add figure + update note * add intro * try fix image path * trigger CI * try not to scale * definition * try image * [skip tests] trigger CI * [tests skip] another one * trigger CI * back to figure * add inference section * add reff * skip tests * [skip tests] * Update docs/src/model_agnostic_methods/conditional_feature_importance.rst Co-authored-by: lionel kusch <[email protected]> * [skip tests] format bullet * rephrase not * [skip tests] * [skip tests] * [skip tests] linkcheck generated ignore images * [skip tests] linkcheck generated ignore * review * trigger CI * Update docs/src/model_agnostic_methods/conditional_feature_importance.rst Co-authored-by: Ángel Reyero Lobo <[email protected]> * Update docs/src/model_agnostic_methods/conditional_feature_importance.rst Co-authored-by: Ángel Reyero Lobo <[email protected]> * Update docs/src/model_agnostic_methods/conditional_feature_importance.rst Co-authored-by: Ángel Reyero Lobo <[email protected]> * Update docs/src/model_agnostic_methods/conditional_feature_importance.rst Co-authored-by: Ángel Reyero Lobo <[email protected]> * genetic example * Update docs/src/model_agnostic_methods/conditional_feature_importance.rst Co-authored-by: Ángel Reyero Lobo <[email protected]> * Update docs/src/model_agnostic_methods/conditional_feature_importance.rst Co-authored-by: Ángel Reyero Lobo <[email protected]> * Update docs/src/model_agnostic_methods/conditional_feature_importance.rst Co-authored-by: Ángel Reyero Lobo <[email protected]> * Update docs/src/model_agnostic_methods/total_sobol_index.rst Co-authored-by: Ángel Reyero Lobo <[email protected]> * Update docs/src/model_agnostic_methods/conditional_feature_importance.rst Co-authored-by: bthirion <[email protected]> * Update docs/src/model_agnostic_methods/conditional_feature_importance.rst Co-authored-by: bthirion <[email protected]> * Update docs/src/model_agnostic_methods/total_sobol_index.rst Co-authored-by: Ángel Reyero Lobo <[email protected]> --------- Co-authored-by: jpaillard <[email protected]> Co-authored-by: lionel kusch <[email protected]> Co-authored-by: Ángel Reyero Lobo <[email protected]> Co-authored-by: bthirion <[email protected]>
1 parent 5292877 commit f8b5446

15 files changed

+280
-7
lines changed

docs/src/glm_methods.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
.. _glm_methods:
2+
3+
4+
======================
5+
GLM methods
6+
======================
7+
8+
.. toctree::
9+
:maxdepth: 2
10+
11+
glm_methods/desparsified_lasso.rst
12+
glm_methods/knockoffs.rst
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
.. _desparsified_lasso:
2+
3+
4+
======================
5+
Desparsified Lasso
6+
======================

docs/src/glm_methods/knockoffs.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
.. _knockoffs:
2+
3+
4+
======================
5+
Knockoffs
6+
======================

docs/src/marginal_methods.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
.. _marginal_methods:
2+
3+
4+
======================
5+
Marginal methods
6+
======================
7+
8+
.. toctree::
9+
:maxdepth: 2
10+
11+
marginal_methods/leave_one_covariate_in.rst
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
.. _leave_one_covariate_in:
2+
3+
======================
4+
Leave-One-Covariate-In
5+
======================

docs/src/methods_list.rst

Lines changed: 0 additions & 6 deletions
This file was deleted.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
.. _model_agnostic_methods:
2+
3+
4+
======================
5+
Model-agnostic methods
6+
======================
7+
8+
.. toctree::
9+
:maxdepth: 2
10+
11+
model_agnostic_methods/total_sobol_index
12+
model_agnostic_methods/leave_one_covariate_out
13+
model_agnostic_methods/conditional_feature_importance
14+
model_agnostic_methods/permutation_feature_importance
Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
.. _conditional_feature_importance:
2+
3+
4+
==============================
5+
Conditional Feature Importance
6+
==============================
7+
8+
Conditional Feature Importance (CFI) is a model-agnostic approach for quantifying the
9+
relevance of individual or groups of features in predictive models. It is a
10+
perturbation-based method that compares the predictive performance of a model on
11+
unmodified test data—following the same distribution as the training data—
12+
to its performance when the studied feature is conditionally perturbed. Thus, this approach
13+
does not require retraining the model.
14+
15+
.. figure:: ../generated/gallery/examples/images/sphx_glr_plot_cfi_001.png
16+
:target: ../generated/gallery/examples/plot_cfi.html
17+
:align: center
18+
19+
20+
Theoretical index
21+
------------------
22+
23+
Conditional Feature Importance (CFI) is a model-agnostic method for estimating feature
24+
importance through conditional perturbations. Specifically, it constructs a perturbed
25+
version of the feature :math:`X_j^P`, sampled independently from the conditional distribution
26+
:math:`P(X_j | X_{-j})`, such that its association with the output is removed:
27+
:math:`X_j^P \perp Y \mid X^{-j}`. The predictive model is then evaluated on the
28+
modified feature vector :math:`\tilde X = [X_1, ..., X_j^P, ..., X_p]`, and the
29+
importance of the feature is quantified by the resulting drop in model performance.
30+
31+
.. math::
32+
\psi_j^{CFI} = \mathbb{E} [\mathcal{L}(y, \mu(\tilde X))] - \mathbb{E} [\mathcal{L}(y, \mu(X))].
33+
34+
35+
The target quantity estimated by CFI is the Total Sobol Index (TSI) :ref:`total_sobol_index`.
36+
Indeed,
37+
38+
.. math::
39+
\frac{1}{2} \psi_j^{CFI}
40+
= \psi_j^{TSI}
41+
= \mathbb{E} [\mathcal{L}(y, \mu_{-j}(X^-j))] - \mathbb{E} [\mathcal{L}(y, \mu(X))].
42+
43+
Where in regression, :math:`\mu_{-j}(X_{-j}) = \mathbb{E}[Y| X_{-j}]` is the
44+
theoretical model without the :math:`j^{th}` feature.
45+
46+
Estimation procedure
47+
--------------------
48+
49+
The estimation of CFI relies on the ability to sample the perturbed feature matrix
50+
:math:`\tilde X`, and specifically to sample :math:`X_j^p` independently from the conditional
51+
distribution, :math:`X_j^p \overset{\text{i.i.d.}}{\sim} P(X_j | X_{-j})`, while breaking the
52+
association with the output :math:`Y`. Any conditional sampler can be used. A valid
53+
and efficient approach is conditional permutation (:footcite:t:`Chamma_NeurIPS2023`).
54+
This procedure decomposes the :math:`j^{th}` feature into a part that
55+
is predictable from the other features and a residual term that is
56+
independent of the other features:
57+
58+
.. math::
59+
X_j = \nu_j(X_{-j}) + \epsilon_j, \quad \text{with} \quad \epsilon_j \perp\!\!\!\perp X_{-j} \text{ and } \mathbb{E}[\epsilon_j] = 0.
60+
61+
Here :math:`\nu_j(X_{-j}) = \mathbb{E}[X_j | X_{-j}]` is the conditional expectation of
62+
:math:`X_j` given the other features. In practice, :math:`\nu_j` is unknown and has to be
63+
estimated from the data using a predictive model.
64+
65+
Then the perturbed feature :math:`X_j^p` is generated by keeping the predictable part
66+
:math:`\nu_j(X_{-j})` unchanged, and by replacing the residual :math:`\epsilon_j` by a
67+
randomly permuted version :math:`\epsilon_j^p`:
68+
69+
.. math::
70+
X_j^p = \nu_j(X_{-j}) + \epsilon_j^p, \quad \text{with} \quad \epsilon_j^p \sim \text{Perm}(\epsilon_j).
71+
72+
73+
.. note:: **Estimation of** :math:`\nu_j`
74+
75+
To generate the perturbed feature :math:`X_j^p`, a model for :math:`\nu_j` is required.
76+
Estimating :math:`\nu_j` amounts to modeling the relationship between features and is
77+
arguably an easier task than estimating the relationship between features and the
78+
target. This 'model-X' assumption was for instance argued in :footcite:t:`Chamma_NeurIPS2023`,
79+
:footcite:t:`candes2018panning`.
80+
For example, in genetics, features such as single nucleotide polymorphisms (SNPs)
81+
are the basis of complex biological processes that result in an outcome (phenotype),
82+
such as a disease. Predicting the phenotype from SNPs is challenging, whereas
83+
modeling the relationships between SNPs is often easier due to known correlation
84+
structures in the genome (linkage disequilibrium). As a result, simple predictive
85+
models such as regularized linear models or decision trees can be used to estimate
86+
:math:`\nu_j`.
87+
88+
89+
Inference
90+
---------
91+
Under standard assumptions such as additive model: :math:`Y = \mu(X) + \epsilon`,
92+
Conditional Feature Importance (CFI) allows for conditional independence testing, which
93+
determines if a feature provides any unique information to the model's predictions that
94+
isn't already captured by the other features. Essentially, we are testing whether the output is independent from the studied feature given the rest of the input:
95+
96+
.. math::
97+
\mathcal{H}_0: Y \perp\!\!\!\perp X_j | X_{-j}.
98+
99+
100+
The core of this inference is to test the statistical significance of the loss
101+
differences estimated by CFI. Consequently, a one-sample test on the loss differences
102+
(or a paired test on the losses) needs to be performed.
103+
104+
Two technical challenges arise in this context:
105+
106+
* When cross-validation (for instance, k-fold) is used to estimate CFI, the loss
107+
differences obtained from different folds are not independent. Consequently,
108+
performing a simple t-test on the loss differences is not valid. This issue can be
109+
addressed by a corrected t-test accounting for this dependence, such as the one
110+
proposed in :footcite:t:`nadeau1999inference`.
111+
* Vanishing variance: Under the null hypothesis, even if the loss difference
112+
converges to zero, the variance of the loss differences also vanishes due to the quadratic functional (:footcite:t:verdinelli2024feature``) . This makes the
113+
standard one-sample t-test invalid. This second issue can be handled by correcting
114+
the variance estimate or using other nonparametric test.
115+
116+
117+
Regression example
118+
------------------
119+
The following example illustrates the use of CFI on a regression task with::
120+
121+
>>> from sklearn.datasets import make_regression
122+
>>> from sklearn.linear_model import LinearRegression
123+
>>> from sklearn.model_selection import train_test_split
124+
>>> from hidimstat import CFI
125+
126+
127+
>>> X, y = make_regression(n_features=2)
128+
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
129+
>>> model = LinearRegression().fit(X_train, y_train)
130+
131+
>>> cfi = CFI(estimator=model, imputation_model_continuous=LinearRegression())
132+
>>> cfi = cfi.fit(X_train, y_train)
133+
>>> features_importance = cfi.importance(X_test, y_test)
134+
135+
136+
Classification example
137+
----------------------
138+
To measure feature importance in a classification task, a classification loss should be
139+
used, in addition, the prediction method of the estimator should output the corresponding
140+
type of prediction (probabilities or classes). The following example illustrates the use
141+
of CFI on a classification task::
142+
143+
>>> from sklearn.datasets import make_classification
144+
>>> from sklearn.ensemble import RandomForestClassifier
145+
>>> from sklearn.linear_model import LinearRegression
146+
>>> from sklearn.metrics import log_loss
147+
>>> from sklearn.model_selection import train_test_split
148+
>>> from hidimstat import CFI
149+
150+
>>> X, y = make_classification(n_features=4)
151+
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
152+
>>> model = RandomForestClassifier().fit(X_train, y_train)
153+
>>> cfi = CFI(
154+
... estimator=model,
155+
... imputation_model_continuous=LinearRegression(),
156+
... loss=log_loss,
157+
... method="predict_proba",
158+
... )
159+
>>> cfi = cfi.fit(X_train, y_train)
160+
>>> features_importance = cfi.importance(X_test, y_test)
161+
162+
References
163+
----------
164+
.. footbibliography::
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
.. _leave_one_covariate_out:
2+
3+
4+
========================
5+
Leave-One-Covariate-Out
6+
========================
7+
8+
TODO: Write this section.
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
.. _permutation_feature_importance:
2+
3+
4+
==============================
5+
Permutation Feature Importance
6+
==============================

0 commit comments

Comments
 (0)