Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 0 additions & 6 deletions docs/src/methods_list.rst

This file was deleted.

11 changes: 11 additions & 0 deletions docs/src/model_agnostic_methods.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.. _model_agnostic_methods:


======================
Model-agnostic methods
======================

.. toctree::
:maxdepth: 2

model_agnostic_methods/pdp_importance.rst
51 changes: 51 additions & 0 deletions docs/src/model_agnostic_methods/pdp_importance.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@

.. _pdp_importance:

============================
PDP-based feature importance
============================

A Partial Dependence Plot (PDP) :footcite:`friedman2001greedy` is a tool used to visualize
the dependence between the target variable and a feature (or set of features) of interest.
To see how they are used to understand the relationship between features and the target,
see :ref:`partial_dependance_plots`. Here we will see how a marginal measure of feature
importance can be derived from PDPs.

A PDP is a plot of the partial dependence function that defined as

.. math::
f_S(x_S) &= \mathbb{E}_{X_{-S}}\left[ f(x_S, X_{-S}) \right]\\
&= \int f(x_S, x_{-S}) d\mathbb{P}(X_{-S}),

and estimated with

.. math::
\bar{f}_S(x_S) \approx \frac{1}{n_\text{n}} \sum_{i=1}^n f(x_S, x_{-S,i}).

It will show how much the value of the target changes when a (or multiple) feature varies. Intuitively, the
flatter the PDP, the less importance a feature should have as it appears to have little impact
on the target. On the other hand, the more a PDP varies, the more signal on the target should
be present in the feature.
Greenwell, Boehmke, and McCarthy :footcite:`greenwell2018simple` propose the following measure
of feature importance for regression:

.. math::
\Psi^{PDP}_S = \sqrt{ \frac{1}{K-1} \sum_{k=1}^K (\bar{f}_S(x_S^k) - \frac{1}{K} \sum_{k=1}^K \bar{f}_S(x_S^k))^2 }.

It corresponds to the deviation of each unique feature value from the average curve.

In classification they suggest:

.. math::
\Psi^{PDP}_S = \frac{ \max_k(\bar{f}_S(x_S^k)) - \min_k(\bar{f}_S(x_S^k)) }{4}.

It is an estimation of the deviation based on the range of values, based on the fact that
for the normal distribution, roughly 95% of the of the data are bewteen minus two and plus
two standard deviations. Therefore the range divided by four is a rough estimation of the
deviation.


References
---------
.. footbibliography::

7 changes: 6 additions & 1 deletion docs/src/visualization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,9 @@

=======================
Tools for visualization
=======================
=======================

.. toctree::
:maxdepth: 2

visualization/partial_dependance_plots
53 changes: 53 additions & 0 deletions docs/src/visualization/partial_dependence.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@

.. _partial_dependance_plots:

========================
Partial Dependence plots
========================

Definition
==========

A Partial Dependence Plot (PDP) :footcite:`friedman2001greedy` is a tool used to visualize
the dependence between the target variable and a feature (or set of features) of interest.
For important features, it can be used to understand their relationship with the target e.g.
is it linear, monotonic or more complex.

.. note::
We are limited to a small subset of features (less than 3) as it becomes tricky to
display more than 3/4 variables simultaneously.
.. maybe put the note later or not at all

The partial dependence function that is plotted is defined as

.. math::
f_S(x_S) &= \mathbb{E}_{X_{-S}}\left[ f(x_S, X_{-S}) \right]\\
&= \int f(x_S, x_{-S}) d\mathbb{P}(X_{-S}),

where :math:`X_S` is the set of input features of interest, :math:`X_{-S}` is its complement
and :math:`f(x_S, x_{-S})` is the learned decision function of the model of interest, evalutated
on a sample :math:`x` whose values for the features in S are :math:`x_S` and for features in -S
are :math:`x_{-S}`. The expectation is taken marginally on the values of :math:`X-{-S}`.

The partial dependence function :math:`f_S` is estimated by Monte-Carlo with
:math:`\bar{f}_S` defined as

.. math::
\bar{f}_S(x_S) \approx \frac{1}{n_\text{n}} \sum_{i=1}^n f(x_S, x_{-S,i}),

where :math:`\{x_{-S,i}}_{i=1}^n` are the values on the training set of the features in -S.
This approximation is equivalent to averaging all the Individual Conditional Expectation (ICE)
curves. These curves are the per-instance version of a PDP, where we display the evolution of
the target when some features change, for one sample of the dataset.
Most plots will include all ICE curves on the same plot with the PDP highlighted.

It is possible to get a measure of feature importance from PDPs, which is explained in this section
of the user guide: :ref:`pdp_importance`.

Example(s)
==========


References
----------
.. bibliography:: ../../tools/references.bib
38 changes: 38 additions & 0 deletions docs/tools/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,15 @@ @article{fan2012variance
year = {2012}
}

@article{friedman2001greedy,
title = {Greedy function approximation: a gradient boosting machine},
author = {Friedman, Jerome H},
journal = {Annals of statistics},
pages = {1189--1232},
year = {2001},
publisher = {JSTOR}
}

@article{gaonkar_deriving_2012,
author = {Gaonkar, Bilwaj and Davatzikos, Christos},
journal = {International Conference on Medical Image Computing and Computer-Assisted Intervention},
Expand All @@ -169,6 +178,24 @@ @article{gaonkar_deriving_2012
year = {2012}
}

@article{greenwell2018simple,
title = {A simple and effective model-based variable importance measure},
author = {Greenwell, Brandon M and Boehmke, Bradley C and McCarthy, Andrew J},
journal = {arXiv preprint arXiv:1805.04755},
year = {2018}
}

@article{goldstein2015peeking,
title = {Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation},
author = {Goldstein, Alex and Kapelner, Adam and Bleich, Justin and Pitkin, Emil},
journal = {journal of Computational and Graphical Statistics},
volume = {24},
number = {1},
pages = {44--65},
year = {2015},
publisher = {Taylor \& Francis}
}

@article{hirschhorn2005genome,
author = {Hirschhorn, Joel N and Daly, Mark J},
journal = {Nature reviews genetics},
Expand Down Expand Up @@ -202,6 +229,17 @@ @article{liu2022fast
year = {2022}
}

@article{molnar2025,
title = {Interpretable Machine Learning},
subtitle = {A Guide for Making Black Box Models Explainable},
author = {Christoph Molnar},
journal = {github},
year = {2025},
edition = {3},
isbn = {978-3-911578-03-5},
url = {https://christophm.github.io/interpretable-ml-book}
}

@article{meinshausen2009pvalues,
author = {Nicolai Meinshausen, Lukas Meier and Peter Buhlmann},
doi = {10.1198/jasa.2009.tm08647},
Expand Down
Loading
Loading