mind-inria · GaetandeCast · Aug 27, 2025 · Aug 28, 2025 · Sep 1, 2025 · Sep 1, 2025
diff --git a/docs/src/methods_list.rst b/docs/src/methods_list.rst
diff --git a/docs/src/model_agnostic_methods.rst b/docs/src/model_agnostic_methods.rst
@@ -0,0 +1,11 @@
+.. _model_agnostic_methods:
+
+
+======================
+Model-agnostic methods
+======================
+
+.. toctree::
+   :maxdepth: 2
+
+   model_agnostic_methods/pdp_importance.rst
diff --git a/docs/src/model_agnostic_methods/pdp_importance.rst b/docs/src/model_agnostic_methods/pdp_importance.rst
@@ -0,0 +1,51 @@
+
+.. _pdp_importance:
+
+============================
+PDP-based feature importance
+============================
+
+A Partial Dependence Plot (PDP) :footcite:`friedman2001greedy` is a tool used to visualize 
+the dependence between the target variable and a feature (or set of features) of interest.
+To see how they are used to understand the relationship between features and the target,
+see :ref:`partial_dependance_plots`. Here we will see how a marginal measure of feature
+importance can be derived from PDPs.
+
+A PDP is a plot of the partial dependence function that defined as
+
+.. math::
+    f_S(x_S) &= \mathbb{E}_{X_{-S}}\left[ f(x_S, X_{-S}) \right]\\
+             &= \int f(x_S, x_{-S}) d\mathbb{P}(X_{-S}),
+
+and estimated with 
+
+.. math::
+    \bar{f}_S(x_S) \approx \frac{1}{n_\text{n}} \sum_{i=1}^n f(x_S, x_{-S,i}).
+
+It will show how much the value of the target changes when a (or multiple) feature varies. Intuitively, the
+flatter the PDP, the less importance a feature should have as it appears to have little impact
+on the target. On the other hand, the more a PDP varies, the more signal on the target should
+be present in the feature.
+Greenwell, Boehmke, and McCarthy :footcite:`greenwell2018simple` propose the following measure
+of feature importance for regression:
+
+.. math::
+    \Psi^{PDP}_S = \sqrt{ \frac{1}{K-1} \sum_{k=1}^K (\bar{f}_S(x_S^k) - \frac{1}{K} \sum_{k=1}^K \bar{f}_S(x_S^k))^2 }.
+
+It corresponds to the deviation of each unique feature value from the average curve.
+
+In classification they suggest:
+
+.. math::
+    \Psi^{PDP}_S = \frac{ \max_k(\bar{f}_S(x_S^k)) - \min_k(\bar{f}_S(x_S^k)) }{4}.
+
+It is an estimation of the deviation based on the range of values, based on the fact that
+for the normal distribution, roughly 95% of the of the data are bewteen minus two and plus
+two standard deviations. Therefore the range divided by four is a rough estimation of the
+deviation.
+
+
+References
+---------
+.. footbibliography::
+
diff --git a/docs/src/visualization.rst b/docs/src/visualization.rst
@@ -3,4 +3,9 @@
 
 =======================
 Tools for visualization
-=======================
+=======================
+
+.. toctree::
+   :maxdepth: 2
+
+   visualization/partial_dependance_plots
diff --git a/docs/src/visualization/partial_dependence.rst b/docs/src/visualization/partial_dependence.rst
@@ -0,0 +1,53 @@
+
+.. _partial_dependance_plots:
+
+========================
+Partial Dependence plots
+========================
+
+Definition
+==========
+
+A Partial Dependence Plot (PDP) :footcite:`friedman2001greedy` is a tool used to visualize
+the dependence between the target variable and a feature (or set of features) of interest.
+For important features, it can be used to understand their relationship with the target e.g.
+is it linear, monotonic or more complex.
+
+.. note::
+    We are limited to a small subset of features (less than 3) as it becomes tricky to
+    display more than 3/4 variables simultaneously. 
+.. maybe put the note later or not at all
+
+The partial dependence function that is plotted is defined as
+
+.. math::
+    f_S(x_S) &= \mathbb{E}_{X_{-S}}\left[ f(x_S, X_{-S}) \right]\\
+             &= \int f(x_S, x_{-S}) d\mathbb{P}(X_{-S}),
+
+where :math:`X_S` is the set of input features of interest, :math:`X_{-S}` is its complement
+and :math:`f(x_S, x_{-S})` is the learned decision function of the model of interest, evalutated
+on a sample :math:`x` whose values for the features in S are :math:`x_S` and for features in -S
+are :math:`x_{-S}`. The expectation is taken marginally on the values of :math:`X-{-S}`.
+
+The partial dependence function :math:`f_S` is estimated by Monte-Carlo with
+:math:`\bar{f}_S` defined as
+
+.. math::
+    \bar{f}_S(x_S) \approx \frac{1}{n_\text{n}} \sum_{i=1}^n f(x_S, x_{-S,i}),
+
+where :math:`\{x_{-S,i}}_{i=1}^n` are the values on the training set of the features in -S.
+This approximation is equivalent to averaging all the Individual Conditional Expectation (ICE)
+curves. These curves are the per-instance version of a PDP, where we display the evolution of
+the target when some features change, for one sample of the dataset.
+Most plots will include all ICE curves on the same plot with the PDP highlighted.
+
+It is possible to get a measure of feature importance from PDPs, which is explained in this section
+of the user guide: :ref:`pdp_importance`.
+
+Example(s)
+==========
+
+
+References
+----------
+.. bibliography:: ../../tools/references.bib
diff --git a/docs/tools/references.bib b/docs/tools/references.bib
@@ -155,6 +155,15 @@ @article{fan2012variance
   year      = {2012}
 }
 
+@article{friedman2001greedy,
+  title     = {Greedy function approximation: a gradient boosting machine},
+  author    = {Friedman, Jerome H},
+  journal   = {Annals of statistics},
+  pages     = {1189--1232},
+  year      = {2001},
+  publisher = {JSTOR}
+}
+
 @article{gaonkar_deriving_2012,
   author  = {Gaonkar, Bilwaj and Davatzikos, Christos},
   journal = {International Conference on Medical Image Computing and Computer-Assisted Intervention},
@@ -169,6 +178,24 @@ @article{gaonkar_deriving_2012
   year    = {2012}
 }
 
+@article{greenwell2018simple,
+  title   = {A simple and effective model-based variable importance measure},
+  author  = {Greenwell, Brandon M and Boehmke, Bradley C and McCarthy, Andrew J},
+  journal = {arXiv preprint arXiv:1805.04755},
+  year    = {2018}
+}
+
+@article{goldstein2015peeking,
+  title     = {Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation},
+  author    = {Goldstein, Alex and Kapelner, Adam and Bleich, Justin and Pitkin, Emil},
+  journal   = {journal of Computational and Graphical Statistics},
+  volume    = {24},
+  number    = {1},
+  pages     = {44--65},
+  year      = {2015},
+  publisher = {Taylor \& Francis}
+}
+
 @article{hirschhorn2005genome,
   author    = {Hirschhorn, Joel N and Daly, Mark J},
   journal   = {Nature reviews genetics},
@@ -202,6 +229,17 @@ @article{liu2022fast
   year      = {2022}
 }
 
+@article{molnar2025,
+  title    = {Interpretable Machine Learning},
+  subtitle = {A Guide for Making Black Box Models Explainable},
+  author   = {Christoph Molnar},
+  journal  = {github},
+  year     = {2025},
+  edition  = {3},
+  isbn     = {978-3-911578-03-5},
+  url      = {https://christophm.github.io/interpretable-ml-book}
+}
+
 @article{meinshausen2009pvalues,
   author    = {Nicolai Meinshausen, Lukas Meier and Peter Buhlmann},
   doi       = {10.1198/jasa.2009.tm08647},