ModelOriented
diff --git a/‎R/kernelshap.R‎
Lines changed: 7 additions & 8 deletions b/‎R/kernelshap.R‎
Lines changed: 7 additions & 8 deletions
diff --git a/‎R/permshap.R‎
Lines changed: 12 additions & 14 deletions b/‎R/permshap.R‎
Lines changed: 12 additions & 14 deletions
diff --git a/‎README.md‎
Lines changed: 3 additions & 2 deletions b/‎README.md‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎man/kernelshap.Rd‎
Lines changed: 7 additions & 8 deletions b/‎man/kernelshap.Rd‎
Lines changed: 7 additions & 8 deletions
diff --git a/‎man/permshap.Rd‎
Lines changed: 14 additions & 13 deletions b/‎man/permshap.Rd‎
Lines changed: 14 additions & 13 deletions
@@ -3,9 +3,9 @@
 #' @description
 #' Efficient implementation of Kernel SHAP, see Lundberg and Lee (2017), and
 #' Covert and Lee (2021), abbreviated by CL21.
-#' By default, for up to p=8 features, exact Kernel SHAP values are returned
+#' By default, for up to p=8 features, exact SHAP values are returned
 #' (with respect to the selected background data).
-#' Otherwise, an almost exact hybrid algorithm combining exact calculations and
+#' Otherwise, a partly exact hybrid algorithm combining exact calculations and
 #' iterative paired sampling is used, see Details.
 #'
 #' @details
@@ -41,10 +41,9 @@
 #' 2. Step 2 (sampling part): The remaining weight is filled by sampling vectors z
 #'   according to Kernel SHAP weights normalized to the values not yet covered by Step 1.
 #'   Together with the results from Step 1 - correctly weighted - this now forms a
-#'   complete iteration as in CL21. The difference is that most mass is covered by exact
-#'   calculations. Afterwards, the algorithm iterates until convergence.
-#'   The output of Step 1 is reused in every iteration, leading to an extremely
-#'   efficient strategy.
+#'   complete iteration as in CL21. The difference is that a significant part of the mass
+#'   is covered by exact calculations. Afterwards, the algorithm iterates until
+#'   convergence. The output of Step 1 is reused in every iteration.
 #'
 #' If \eqn{p} is sufficiently small, all possible \eqn{2^p-2} on-off vectors \eqn{z} can be
 #' evaluated. In this case, no sampling is required and the algorithm returns exact
@@ -76,7 +75,7 @@
 #' @param bg_w Optional vector of case weights for each row of `bg_X`.
 #'   If `bg_X = NULL`, must be of same length as `X`. Set to `NULL` for no weights.
 #' @param bg_n If `bg_X = NULL`: Size of background data to be sampled from `X`.
-#' @param exact If `TRUE`, the algorithm will produce exact Kernel SHAP values
+#' @param exact If `TRUE`, the algorithm will produce exact SHAP values
 #'   with respect to the background data.
 #'   The default is `TRUE` for up to eight features, and `FALSE` otherwise.
 #' @param hybrid_degree Integer controlling the exactness of the hybrid strategy. For
@@ -89,7 +88,7 @@
 #'     for the exact part. The remaining mass is covered by random sampling.
 #'   - `2`: Uses all \eqn{p(p+1)} on-off vectors \eqn{z} with
 #'     \eqn{\sum z \in \{1, 2, p-2, p-1\}}. The remaining mass is covered by sampling.
-#'     Convergence usually happens very fast.
+#'     Usually converges fast.
 #'   - `k>2`: Uses all on-off vectors with
 #'     \eqn{\sum z \in \{1, \dots, k, p-k, \dots, p-1\}}.
 #' @param m Even number of on-off vectors sampled during one iteration.
 
@@ -6,37 +6,35 @@
 #'
 #' By default, for up to p=8 features, exact SHAP values are returned
 #' (exact with respect to the selected background data).
-#'
 #' Otherwise, the sampling process iterates until the resulting values
 #' are sufficiently precise, and standard errors are provided.
 #'
+#' @details
 #' During each iteration, the algorithm cycles twice through a random permutation:
 #' It starts with all feature components "turned on" (i.e., taking them
 #' from the observation to be explained), then gradually turning off components
-#' according to the permutation (i.e., marginalizing them over the background data).
+#' according to the permutation.
 #' When all components are turned off, the algorithm - one by one - turns the components
 #' back on, until all components are turned on again. This antithetic scheme allows to
-#' evaluate Shapley's formula 2p times with each permutation, using a total of
-#' 2p + 1 evaluations of marginal means.
+#' evaluate Shapley's formula twice per feature using a single permutation and a total
+#' of 2p disjoint evaluations of the contribution function.
 #'
 #' For models with interactions up to order two, one can show that
-#' even a single iteration provides exact SHAP values (with respect to the
-#' given background dataset).
+#' even a single iteration provides exact SHAP values for all features
+#' (with respect to the given background dataset).
 #'
 #' The Python implementation "shap" uses a similar approach, but without
-#' providing standard errors, and without early stopping. To mimic its behavior,
-#' we would need to set `max_iter = p` in R, and `max_eval = (2*p+1)*p` in Python.
+#' providing standard errors, and without early stopping.
 #'
 #' For faster convergence, we use balanced permutations in the sense that
 #' p subsequent permutations each start with a different feature.
 #' Furthermore, the 2p on-off vectors with sum <=1 or >=p-1 are evaluated only once,
-#' similar to the degree 1 hybrid in [kernelshap()] (but covering less weight).
+#' similar to the degree 1 hybrid in [kernelshap()].
 #'
-#' @param exact If `TRUE`, the algorithm produces exact SHAP values
-#'   with respect to the background data.
-#'   The default is `TRUE` for up to eight features, and `FALSE` otherwise.
-#' @param low_memory If `FALSE` (default up to p = 15), the algorithm evaluates p
-#'   predictions together, reducing the number of calls to `predict()`.
+#' @param low_memory If `FALSE` (default up to p = 15), the algorithm does p
+#'   iterations in one chunk, evaluating Shapley's formula 2p^2 times.
+#'   For models with interactions up to order two, you can set this to `TRUE`
+#'   to save time.
 #' @inheritParams kernelshap
 #' @returns
 #'   An object of class "kernelshap" with the following components:
 
@@ -16,7 +16,7 @@
 The package contains three functions to crunch SHAP values:
 
 - **`permshap()`**: Permutation SHAP algorithm of [1]. Both exact and sampling versions are available.
-- **`kernelshap()`**: Kernel SHAP algorithm of [2] and [3]. Both exact and (pseudo-exact) sampling versions are available.
+- **`kernelshap()`**: Kernel SHAP algorithm of [2] and [3]. Both exact and (partly exact) sampling versions are available.
 - **`additive_shap()`**: For *additive models* fitted via `lm()`, `glm()`, `mgcv::gam()`, `mgcv::bam()`, `gam::gam()`, `survival::coxph()`, or `survival::survreg()`. Exponentially faster than the model-agnostic options above, and recommended if possible.
 
 To explain your model, select an explanation dataset `X` (up to 1000 rows from the training data, feature columns only). Use {shapviz} to visualize the resulting SHAP values. 
@@ -25,8 +25,9 @@ To explain your model, select an explanation dataset `X` (up to 1000 rows from t
 
 - Both algorithms need a representative background data `bg_X` to calculate marginal means (up to 500 rows from the training data). In cases with a natural "off" value (like MNIST digits), this can also be a single row with all values set to the off value. If unspecified, 200 rows are randomly sampled from `X`.
 - Exact Kernel SHAP gives identical results as exact permutation SHAP. Both algorithms are fast up to 8 features.
-  With more features, `kernelshap()` switches to an almost exact algorithm with faster convergence than the sampling version of permutation SHAP.
+  With more features, `kernelshap()` switches to a partly exact algorithm with faster convergence than the sampling version of permutation SHAP.
 - For models with interactions of order up to two, the sampling versions provide the same results as the exact versions.
+- Sampling versions iterate until standard errors of SHAP values are sufficiently small.
 - For additive models, `permshap()` and `kernelshap()` give the same results as `additive_shap` 
 as long as the full training data would be used as background data.