Update docu

mayer79 · mayer79 · commit e8ca9bda9fe7 · 2025-07-19T21:36:33.000+02:00
diff --git a/R/kernelshap.R b/R/kernelshap.R
@@ -8,17 +8,11 @@
 #' Otherwise, an almost exact hybrid algorithm combining exact calculations and
 #' iterative paired sampling is used, see Details.
 #'
-#' Note that (exact) Kernel SHAP is only an approximation of (exact) permutation SHAP.
-#' Thus, for up to eight features, we recommend [permshap()]. For more features,
-#' [permshap()] tends to be inefficient compared the optimized hybrid strategy
-#' of Kernel SHAP.
-#'
 #' @details
 #' The pure iterative Kernel SHAP sampling as in Covert and Lee (2021) works like this:
 #'
-#' 1. A binary "on-off" vector \eqn{z} is drawn from \eqn{\{0, 1\}^p}
-#'   such that its sum follows the SHAP Kernel weight distribution
-#'   (normalized to the range \eqn{\{1, \dots, p-1\}}).
+#' 1. A binary "on-off" vector \eqn{z} is drawn from \eqn{\{0, 1\}^p} according to
+#'   a special weighting logic.
 #' 2. For each \eqn{j} with \eqn{z_j = 1}, the \eqn{j}-th column of the
 #'   original background data is replaced by the corresponding feature value \eqn{x_j}
 #'   of the observation to be explained.
@@ -33,17 +27,14 @@
 #'
 #' This is repeated multiple times until convergence, see CL21 for details.
 #'
-#' A drawback of this strategy is that many (at least 75%) of the \eqn{z} vectors will
-#' have \eqn{\sum z \in \{1, p-1\}}, producing many duplicates. Similarly, at least 92%
-#' of the mass will be used for the \eqn{p(p+1)} possible vectors with
-#' \eqn{\sum z \in \{1, 2, p-2, p-1\}}.
+#' To avoid the evaluation of
 #' This inefficiency can be fixed by a hybrid strategy, combining exact calculations
 #' with sampling.
 #'
 #' The hybrid algorithm has two steps:
 #' 1. Step 1 (exact part): There are \eqn{2p} different on-off vectors \eqn{z} with
-#'   \eqn{\sum z \in \{1, p-1\}}, covering a large proportion of the Kernel SHAP
-#'   distribution. The degree 1 hybrid will list those vectors and use them according
+#'   \eqn{\sum z \in \{1, p-1\}}.
+#'   The degree 1 hybrid will list those vectors and use them according
 #'   to their weights in the upcoming calculations. Depending on \eqn{p}, we can also go
 #'   a step further to a degree 2 hybrid by adding all \eqn{p(p-1)} vectors with
 #'   \eqn{\sum z \in \{2, p-2\}} to the process etc. The necessary predictions are
@@ -96,12 +87,10 @@
 #'     worse than the hybrid strategy and should therefore only be used for
 #'     studying properties of the Kernel SHAP algorithm.
 #'   - `1`: Uses all \eqn{2p} on-off vectors \eqn{z} with \eqn{\sum z \in \{1, p-1\}}
-#'     for the exact part, which covers at least 75% of the mass of the Kernel weight
-#'     distribution. The remaining mass is covered by random sampling.
+#'     for the exact part. The remaining mass is covered by random sampling.
 #'   - `2`: Uses all \eqn{p(p+1)} on-off vectors \eqn{z} with
-#'     \eqn{\sum z \in \{1, 2, p-2, p-1\}}. This covers at least 92% of the mass of the
-#'     Kernel weight distribution. The remaining mass is covered by sampling.
-#'     Convergence usually happens in the minimal possible number of iterations of two.
+#'     \eqn{\sum z \in \{1, 2, p-2, p-1\}}. The remaining mass is covered by sampling.
+#'     Convergence usually happens very fast.
 #'   - `k>2`: Uses all on-off vectors with
 #'     \eqn{\sum z \in \{1, \dots, k, p-k, \dots, p-1\}}.
 #' @param m Even number of on-off vectors sampled during one iteration.
diff --git a/R/permshap.R b/R/permshap.R
@@ -32,7 +32,7 @@
 #' Furthermore, the 2p on-off vectors with sum <=1 or >=p-1 are evaluated only once,
 #' similar to the degree 1 hybrid in [kernelshap()] (but covering less weight).
 #'
-#' @param exact If `TRUE`, the algorithm will produce exact SHAP values
+#' @param exact If `TRUE`, the algorithm produces exact SHAP values
 #'   with respect to the background data.
 #'   The default is `TRUE` for up to eight features, and `FALSE` otherwise.
 #' @param low_memory If `FALSE` (default up to p = 15), the algorithm evaluates p
diff --git a/README.md b/README.md
@@ -15,20 +15,18 @@
 
 The package contains three functions to crunch SHAP values:
 
-- **`permshap()`**: Permutation SHAP algorithm of [1]. Recommended for models with up to 8 features, or if you don't trust Kernel SHAP. Both exact and sampling versions are available.
-- **`kernelshap()`**: Kernel SHAP algorithm of [2] and [3]. Recommended for models with more than 8 features. Both exact and (pseudo-exact) sampling versions are available.
+- **`permshap()`**: Permutation SHAP algorithm of [1]. Both exact and sampling versions are available.
+- **`kernelshap()`**: Kernel SHAP algorithm of [2] and [3]. Both exact and (pseudo-exact) sampling versions are available.
 - **`additive_shap()`**: For *additive models* fitted via `lm()`, `glm()`, `mgcv::gam()`, `mgcv::bam()`, `gam::gam()`, `survival::coxph()`, or `survival::survreg()`. Exponentially faster than the model-agnostic options above, and recommended if possible.
 
-To explain your model, select an explanation dataset `X` (up to 1000 rows from the training data, feature columns only) and apply the recommended function. Use {shapviz} to visualize the resulting SHAP values. 
+To explain your model, select an explanation dataset `X` (up to 1000 rows from the training data, feature columns only). Use {shapviz} to visualize the resulting SHAP values. 
 
 **Remarks to `permshap()` and `kernelshap()`**
 
 - Both algorithms need a representative background data `bg_X` to calculate marginal means (up to 500 rows from the training data). In cases with a natural "off" value (like MNIST digits), this can also be a single row with all values set to the off value. If unspecified, 200 rows are randomly sampled from `X`.
-- Exact Kernel SHAP is an approximation to exact permutation SHAP. Since exact calculations are usually sufficiently fast for up to eight features, we recommend `permshap()` in this case. With more features, `kernelshap()` switches to a comparably fast, almost exact algorithm with faster convergence than the sampling version of permutation SHAP.
-  That is why we recommend `kernelshap()` in this case.
-- For models with interactions of order up to two, SHAP values of permutation SHAP and Kernel SHAP agree, 
-and the implemented sampling versions provide the same results as the exact versions.
-In the presence of interactions of order three or higher, this is no longer the case.
+- Exact Kernel SHAP gives identical results as exact permutation SHAP. Both algorithms are fast up to 8 features.
+  With more features, `kernelshap()` switches to an almost exact algorithm with faster convergence than the sampling version of permutation SHAP.
+- For models with interactions of order up to two, the sampling versions provide the same results as the exact versions.
 - For additive models, `permshap()` and `kernelshap()` give the same results as `additive_shap` 
 as long as the full training data would be used as background data.
 
@@ -89,13 +87,12 @@ ps
 [1,]  1.1913247  0.09005467 -0.13430720 0.000682593
 [2,] -0.4931989 -0.11724773  0.09868921 0.028563613
 
-# Kernel SHAP gives very slightly different values because the model contains
-# interations of order > 2:
+# Indeed, Kernel SHAP gives the same:
 ks <- kernelshap(fit, X, bg_X = bg_X)
 ks
-#       log_carat     clarity       color        cut
-# [1,]  1.1911791  0.0900462 -0.13531648 0.001845958
-# [2,] -0.4927482 -0.1168517  0.09815062 0.028255442
+      log_carat     clarity       color         cut
+[1,]  1.1913247  0.09005467 -0.13430720 0.000682593
+[2,] -0.4931989 -0.11724773  0.09868921 0.028563613
 
 # 4) Analyze with {shapviz}
 ps <- shapviz(ps)
diff --git a/backlog/compare_with_python2.R b/backlog/compare_with_python2.R
@@ -0,0 +1,63 @@
+library(kernelshap)
+
+n <- 100
+
+X <- data.frame(
+  x1 = seq(1:n) / 100,
+  x2 = log(1:n),
+  x3 = sqrt(1:n),
+  x4 = sin(1:n),
+  x5 = (seq(1:n) / 100)^2,
+  x6 = cos(1:n)
+)
+head(X)
+
+pf <- function(model, newdata) {
+  x <- newdata
+  x[, 1] * x[, 2] * x[, 3] * x[, 4] + x[, 5] + x[, 6]
+}
+ks <- kernelshap(pf, head(X), bg_X = X, pred_fun = pf)
+ks # -1.196216 -1.241848 -0.9567848 3.879420 -0.33825  0.5456252
+es <- permshap(pf, head(X), bg_X = X, pred_fun = pf)
+es # -1.196216 -1.241848 -0.9567848 3.879420 -0.33825  0.5456252
+
+set.seed(10)
+kss <- kernelshap(
+  pf,
+  head(X, 1),
+  bg_X = X,
+  pred_fun = pf,
+  hybrid_degree = 0,
+  exact = F,
+  m = 9000,
+  max_iter = 100,
+  tol = 0.0005
+)
+kss # -1.198078 -1.246508 -0.9580638 3.877532 -0.3241824 0.541247
+
+set.seed(2)
+ksh <- kernelshap(
+  pf,
+  head(X, 1),
+  bg_X = X,
+  pred_fun = pf,
+  hybrid_degree = 1,
+  exact = FALSE,
+  max_iter = 10000,
+  tol = 0.0005
+)
+ksh # -1.191981 -1.240656 -0.9516264 3.86776 -0.3342143 0.5426642
+
+set.seed(1)
+ksh2 <- kernelshap(
+  pf,
+  head(X, 1),
+  bg_X = X,
+  pred_fun = pf,
+  hybrid_degree = 2,
+  exact = FALSE,
+  m = 10000,
+  max_iter = 10000,
+  tol = 0.0001
+)
+ksh2 # 1.195976 -1.241107 -0.9565121 3.878891 -0.3384621 0.5451118
diff --git a/man/kernelshap.Rd b/man/kernelshap.Rd
diff --git a/man/permshap.Rd b/man/permshap.Rd